Privacy Boundary Determination of Smart Meter Data Using an Artificial Intelligence Adversary

Summary: The roll-out of the new generation smart meter with artificial intelligence (AI)-based data mining algorithms causes serious privacy issues for consumers. By detecting appliance usages, an adversary can easily monitor the behaviour patterns of residents. In this paper, a privacy-preserving smart metering model is proposed; the system utilizes a data aggregator to aggregate the readings of neighbouring smart meters and a data down-sampler to reduce the sensitive information in the load profiles. An AI-based adversary is introduced to simulate the adversarial process. Four state-of-the-art deep learning/machine learning algorithms (convolutional neural network – long short-term memory (CNN-LSTM); gated recurrent unit (GRU); k-nearest neighbours (KNN); and CNN) are employed as data mining algorithms. By tuning the variables (aggregation size α and interval resolution σ), the detectability boundaries of particular appliances are evaluated. Based on the appliance detectability, a three-level privacy boundary (real-time surveillance, presence/absence detection, and complete protection) is obtained. The result shows that to achieve complete data protection, the aggregation size should exceed 40, and the interval resolution should exceed 8 hours. number; 𝑋 𝑡𝑗 , the power consumption of smart meter 𝑗 at time slice 𝑡 ; 𝑋 𝑇 , load profile sequence; 𝜏 , original interval resolution; 𝑖 , electricity appliance series number; 𝑁 , total appliance categories; 𝑌 𝑡𝑖 , power consumption of appliance 𝑖 at time slice 𝑡 ; 𝑌 𝑁×𝑇 , appliance profile sequence matrix; 𝑀 𝑇 , modified load sequence; 𝒜 , adversary model; 𝒫 , privacy-preserving model; 𝑓 𝒫 , privacy-preserving functions; 𝑓 𝒜 (𝑡) , adversary function; 𝛼 , aggregation size; 𝜎 , downsampled interval resolution; 𝛾 , the ratio of modified interval resolution and original interval resolution; 𝑟 𝑡 , reset gate; 𝑧 𝑡 , update gate; ℎ 𝑡−1 , previous cell state; ℎ 𝑡 , current cell state; ℎ̃ 𝑡 , candidate cell state; 𝑔 𝑡 , input node, 𝑖 𝑡 , input gate, 𝑐 𝑡 , internal gate, 𝑓 𝑡 , forget gate, 𝑜 𝑡 , output gate; 𝜌 , Pearson correlation coefficient; 𝜙 , tanh activation.


Motivation
The smart meter is a new generation electricity measurement device that enables realtime communication between the demand side and the utility. This meter also provides high-granularity electricity data (high-interval resolution data on real-time energy consumption, bills, time-of-use tariffs, etc.) 1 . Moreover, the high granularity of data boosts artificial intelligence (AI) applications in smart grids. AI data analysis and data mining tools (such as machine learning/deep learning) have been widely adopted in smart grid applications, such as short-term load forecasts, renewable energy management, and nonintrusive load monitoring (NILM) 2 . However, smart meter and AI applications are double-edged swords since they introduce severe privacy issues to consumers. By adopting AI mining algorithms on smart meter data (such as NILM), the adversary can easily infer personal information from smart meter data 3 .

Literature review
To protect private information in smart meter data, two categories of approaches are proposed in the literature: demand shaping and data manipulation. Demand shaping techniques mask the ground truth load profiles by utilising extra energy storage facilities (such as a rechargeable battery and renewable energy system). The energy management unit (EMU) controls the energy storage device charge/discharge to fill the gap between the "average daily demand" and "instantaneous demand" to minimise information leakage 4 .
Data manipulation modifies the original smart meter data with informatics techniques before sending the data to the utility 3 . Among all informatic techniques, the data aggregation approach, data distortion approach, and data down-sampling approach are widely discussed in the literature. The data aggregation approach (or spatial aggregation) envisages sending aggregate power measures for a group of smart meters to prevent the utility from distinguishing individual power consumption 5 . The data aggregation scheme introduces a data aggregator (DA) with/without a trusted third party (TTP). To guarantee security during data communication, encryption mechanisms such as homomorphic encryption (HE) 6 and multiparty computation (MPC) 7 are introduced. These advanced encryption algorithms enable third parties to operate the data without knowing the details of the data. The data down-sampling approach (or temporal aggregation) aggregates the data from neighbouring timestamps 8 . As the interval spans, the sensitive information in the load profile also decreases 9 .
Empirical methods to quantify the privacy boundary are discussed by N. Buescher et al. 10 and EA Technology 11 . A naïve statistical analysis is implemented in 11 , and three privacy metrics, visual inspection, correlation analysis, and clustering analysis, are proposed in this work to determine the optimum aggregation size. Their result shows that two houses are enough to achieve high-level anonymity. However, another study by N. Buescher shows that challengers can still obtain an advantage with a minimum aggregation size of 100 houses 10 . These conventional methods can only measure the similarity between the individual power consumption and aggregated power consumption rather than privacy leakage; the adversarial model is also not introduced.
Relevant work that utilises AI adversaries to protect privacy includes the differential privacy NILM algorithm, generative adversarial privacy model, and NILM adversarial model. In differential privacy NILM, a differential private stochastic gradient descent (DP-SGD) mechanism is employed 12 . Random Gaussian noise is added to the gradient of every training step, achieving (ε, δ) differential privacy 13 . M. Shateri et al. 14 introduce an adversarial modelling framework that consists of a data releaser and an adversary. Both the releaser and the adversary utilise recurrent neural networks against each other. The privacy performance of the releaser is improved because of competition. G. Eibl and D. Engel 15 discuss the relationship between interval resolution and privacy in edge detection-based NILM technology. They find that with intervals under 15 min, which is the sampling frequency adopted by most EU manufacturers, most appliances are still detectable. Although many works have proposed different privacy-preserving smart metering schemes, few studies demonstrate the process of how the adversary obtains valuable information from the load profile. Moreover, there is a lack of information on the correlation of data granularity (e.g., interval resolution, aggregation size) with the sensitivity information.

Contributions
Inspired by the generative adversarial network (GAN) proposed by I. Goodfellow in 2014 16 , this work trains an artificial intelligence adversarial model to improve the performance of the privacy-preserving model and further detect the boundary of the privacy-preserving model. The main novelties of this paper are listed as follows: (1) A privacy-preserving smart metering system that combines a data aggregation approach and a data down-sampling approach is proposed. The system enables functionalities (billing, grid management and operation) and simultaneously protects private information.
(2) This work employs an AI-based adversary model to demonstrate the adversarial process. The adversary can use state-of-the-art convolutional neural networklong short-term memory (CNN-LSTM), gated recurrent unit (GRU), CNN, and k-nearest neighbours (KNN) deep neural networks to detect appliance usages and further infer the behaviour patterns of the residents.
(3) The influence of two parameters, aggregation size α and interval resolution σ, on the appliance detectability is investigated by simulation. Nine typical appliances that represent three load categories (continuous load, intermittent load, and active load) are included in the study.
(4) A three-level privacy boundary (real-time surveillance, presence/absence detection, complete protection) is presented based on the simulation results.
This benchmark would either benefit consumers to better understand how safe smarts are installed in their homes or contribute to policymakers in regulating smart meter markets.

Organisation of the paper
The remainder of the paper is organised as follows: The problem formulation is demonstrated in Section 2. In Section 3, the privacy-preserving model, as well as the AI adversary model, is introduced. In Section 4, the implantation process, which includes dataset construction, data preprocessing, and privacy metrics, is illustrated.
Three case studies are designed in Section 5 to determine the privacy boundaries of smart meter data, including aggregation size, interval resolution, and the combined effect of these two factors. The conclusion and future works are drawn in the last section.

PROBLEM FORMULATION
Referring to X. Zhang et al. 17 , the privacy intrusion issues raised by smart meters include data sensitivity and algorithm sensitivity. For data sensitivity, real-time highresolution data (active/reactive power, voltage, time-of-use tariff, etc.) collected by the new generation smart meter provides rich information for adversaries. The adversaries can access the collected smart meter data (e.g., purchase from the energy suppliers or hack into the smart metering system). State-of-the-art data-driven deep learning-based NILM algorithms enable adversaries to extract behaviour patterns based on high granularity data (refer to Figure 1).
In this paper, we denote the power consumption recorded by the smart meter at time slice ∈ : = {1,2, ⋯ , } as , and the original interval resolution is denoted as .
In conventional smart metering systems, can be decomposed into individual appliance signals via the NILM algorithm implemented by a third party: where is the power consumption of electrical appliance (ranging from 1 to ) at time slice . The load profile sequence is denoted as . The appliance profile sequence matrix is denoted as × : Dataport) The mathematical model that shows the data privacy preservation and adversary inference process is presented in Figure 2. Since × contains sensitive information that can be used for behaviour pattern identification, the purpose of the privacypreserving model is to modify the original load profile into a modified load sequence to hide sensitive information × . In this paper, two privacypreserving functions are thoroughly investigated: the data aggregation function and the data down-sampling function, as shown in Section 3. Moreover, the difference between and is measured by mutual information (MI). In contrast, the purpose of the adversary model is to infer information about × from as much as possible ( ( , | )) at the real-time base, and the adversary function ( ) is expressed as:

Privacy-preserving model
The conventional smart meter has a fixed sampling frequency and directly sends the power consumption data to the utility without any modification. This single-channel smart metering system has a high risk of revealing private information to the energy supplier or third parties. To overcome the drawbacks of the existing smart metering system, a two-channel smart metering system is proposed (refer to Figure 3). The main structure of the proposed system is an aggregator and a data down-sampler. The purpose of the data aggregator is to concurrently aggregate the smart meter data of neighbouring smart meters and send the aggregated data to the grid operator; then, the grid operator sends commands to manage and operate the grid. The data down-sampler channel down-samples the data for billing purposes only.

Data aggregation scheme
In the data aggregation scheme, the privacy-preserving function is the data aggregation function. In this scheme, a data aggregator that aggregates all smart meters under the aggregator is constructed. It is meaningful to quantify the aggregation size that can satisfy the privacy requirement to minimise investments. Encryption methods such as HE 18 , zero-knowledge protocols 19 , and MPC 20 are applied to guarantee communication between smart meters and the aggregator. Detailed encryption algorithms are beyond the scope of this paper.
As shown in Figure 4, is the reading of smart meter (1 ≪ ≪ ), where is the total number of smart meters under the aggregator. At each timestep , an aggregator synchronously aggregates readings from all smart meters: Figure 4 Privacy-preserving data aggregation channel

Data down-sampling scheme
The interval resolution of the existing smart meter ranges from 5 seconds to 15 minutes depending on the manufacturer 21 . Current NILM algorithms achieve high accuracy even with a low sampling rate 22 . Hence, as a vital variable that influences private information leakage, the privacy boundary of should be quantified. The down-sampling channel aims to reduce sensitive information by reducing the interval resolution of the metered data. A simplified down-sampling scheme is shown in Figure   5; the original curve is flattened by taking the average power consumption of several sampling points. We define a down-sampled interval resolution , which is an integer multiple of the original interval resolution ( = / ). At the end of each time slice, the down-sampler takes the average value of all data within the time window:

Long Short-Term Memory (LSTM)
Unlike conventional recurrent neural networks (RNNs), which are designed for shortterm memory and have poor performance for long sequences (vanishing gradient), LSTM retains both long-term and short-term information without much loss by introducing a memory cell. Moreover, LSTM has gates to help memory cells regulate information from the past.
LSTM has recurrent edges that connect adjacent time steps, enabling LSTM to selectively pass information across sequence steps. The structure of a typical LSTM block is shown in Figure 6. As demonstrated in (6)(7)(8), the components inside the block include an input node , input gate , internal gate , forget gate , output gate , and output ℎ . The gate's nature is a sigmoid unit (output range between [0, 1]); it can recognise and pass important information and block unimportant information. Once the input and output gates are closed, the flow will be blocked inside the memory cell and will not affect the following time steps until the gate reopens.
Both , , and are the functions of data in the current time step input and the output of the previous time step ℎ −

NILM-based adversary model
In this paper, a 1DCNN-LSTM NILM model is adopted as the adversary .  Table 2, which shows that the AI adversary achieves an average accuracy of 83%, which shows that the adversary has a high computation ability in detecting behaviour patterns. The aggregation size and down-sampling resolution increase steadily until the adversary action of fails to .
(1) Input Data: The collected data are pre-processed and fed into the model. (7) Dropout Layer: Dropout is an effective regularization method that is employed in neural networks to avoid overfitting. The dropout layer will randomly set the weight of neurons to zero during the training process. In this model, we set the dropout rate to 0.5, which means that 50% of neurons obtain a zero weight.
(8) Fully Connected Layer: The final layer will reduce the output matrix to a single value between 0 and 1, which is the estimated active power of the targeted household appliance.

Dataset construction
The data adopted in this paper are The Reference Energy Disaggregation Data Set (REDD) 23 and Pecan Street Dataport (Dataport) 24 ; refer to Table 3. Both datasets contain appliance-level and house-level power consumption data. Hence, not only the load profiles but also the appliance signatures can be obtained from the datasets. We  Table 2. The power threshold in the table represents the minimum power to operate the appliance. The threshold is the minimum power to start the device; when the power is larger than the power threshold, we regard the appliance as "on".
Minimum duration represents the minimum operating hours of a particular appliance throughout the day. The rated power is the highest power input allowed through a particular device.
Aggregation Size Dataset: Referring to Section 3.1.1 and (4), the houses inside an aggregation group are selected randomly from two datasets to make up the new dataset.
The new dataset is split into training/testing datasets (90% for training and 10% for testing). The input data of the model are the aggregated power consumption

Data preprocessing
The purpose of data pre-processing is to make the input data more amendable to the model. Typically, data pre-processing consists of vectorization, missing data detection, and data normalisation. Since the input data are already vectorised, only normalisation and missing data detection are required.

Missing data detection
There are some missing values in the original data for some reason; these missing values will influence the performance of the model. In this study, we replace all missing values with '0'.

Data normalisation
Normalisation is vital to the neural network to prevent it from converging. In this work, max-min normalisation is adopted to guarantee that all input values range between 0 and 1. The equation of max-min normalisation is shown in (12): where max (x) and min (x) represent the maximum value of the data and minimum value of the data, respectively.

Hardware & software platform
The simulation and computation are implemented on a Dell laptop equipped with a Core i7-7700HQ CPU, NVIDIA GTX 1060 GPU, and 8 GB RAM. The deep learning algorithm runs on Python 3.7, and the TensorFlow 2 framework is adopted to train the DNN model.

Privacy metrics for appliance detection
Once the adversary model is designed, the performance of the adversary should be evaluated and quantified. In this section, we introduce two performance metrics that assess the performance of DNNs.

F-measure (F1 score)
The F-measure is a performance measurement for classification adopted in NILM works and privacy measures 15 . As shown in Table 4, there are four combinations of the confusion matrix (TP, FP, FN, and TN); each element represents one estimation condition (whether the estimation is correct or incorrect). Based on the matrix, the F-measure can be calculated (refer to (13)). Usually, when the F-measure is smaller than 0.5, the classifier is inadequate.

Correlation analysis
The Pearson correlation coefficient is used to measure whether two continuous variables are linearly associated. The value of ranges from -1 to 1 (a positive value indicates a positive correlation, while a negative value indicates a negative correlation); the larger is , the stronger is the correlation between two variables. The expression of the Pearson correlation coefficient is shown in (14): (14) where n is the sample size, is the appliance power consumption at time t and is the power consumption generated by the adversary; ̅ and ̅ are the mean values of and , respectively. A benchmark is presented for the following analysis process; refer to Table 5. We define an appliance as measurable when two metrics, the Fmeasure and ρ, are lower than 0.2. This section quantifies the privacy boundary influenced by aggregation size α and interval resolution σ. Two case studies are designed for each parameter; both the detectability of particular appliances and algorithm sensitivity in two privacypreserving schemes are thoroughly investigated. A discussion based on the results is also presented to demonstrate the proposed three-level privacy benchmarks.

Privacy boundary level based on electrical events
Household appliances can be divided into three categories, loads depending on the characteristics and operating duration of the loads 25 . Detailed classifications are described as follows: Continuous load: A continuous load means that the device consumes energy throughout the day, such as the refrigerator and freezer, as well as the computer and printer in "standby" mode. Since continuous loads are not influenced by residents' activities, these loads contain minimal sensitive information.
Intermittent load: These appliances are not always on but are active enough to be recorded by the lowest hourly smart meters, such as air conditioners, electric vehicles, furnaces, and water heaters. Level III (complete protection): No event is detected by the adversary, or only continuous loads are detected by . At this level of protection, cannot infer any sensitive information from given data.

Privacy boundary of aggregation size α
This case study focuses on the privacy-preserving aggregation channel illustrated in Section 3.1.1. Recalling ( ) in (4), the aggregation size α is an essential variable that influences the performance of . The purpose of is to detect appliance usage given . As demonstrated in Figures 3 and 4, the precision of detection is evaluated when α increases steadily.

Detectability of particular appliances in an aggregation scheme
has high accuracy in appliance detection in a single house, which raises privacy issues related to smart meters. Recall the threshold identified in Table 5; an appliance is defined as detectable when both the F-measure and ρ are higher than 0.2. In this case, study, nine typical appliances, which are introduced in Table 2 Table 2). By steadily increasing α from 1 to 100, the number of smart meters inside an aggregator is enlarged. The mutual information between and also decreases with an increase in α, which increases the difficulty of 's inference process.   Based on the results shown in Figure 7, a correlation analysis between appliance characteristic properties and adversary detectability is implemented (shown in Table 6).
It is observed that the three characteristics show almost equal correlations with adversary detectability (0.44 for Rating, 0.50 for Threshold, and 0.53 for Minimum Duration). To summarise, appliances with high ratings, high threshold, and long duration (such as AC, FUR, and DRY) require larger α to blind .

Sensitivity of algorithms in an aggregation scheme
Rather than the CNN-LSTM algorithm adopted in the previous sections, can also adopt different deep learning algorithms. In this case, the sensitivity of the algorithms in an aggregation scheme is discussed. Apart from the proposed algorithm, three s that adopt state-of-the-art algorithms, such as(GRU) 26 , CNN 27 , and the neighbour KNN 28 NILM algorithms, are well analysed, referring to previous works. In Figure 9,

Identifying the boundary of interval resolution
The privacy boundary of another critical parameter, interval resolution σ, is discussed in this section. Similar to Section 5.2, two case studies are implemented to investigate the appliance detectability and algorithm sensitivity. The original interval resolution of the dataset is 3 s. By implementing the down-sampling formula in (5), a new dataset with a larger σ is obtained.

Detectability of particular appliances in a data down-sampling scheme
Similar to Section 5.2.1, the detectability of α on appliances regarding different σ is discussed in this section. As shown in Figure 5, high granularity smart meter data with a small α contain more detailed features of the load profile, and can easily apply the NILM algorithm and infer private information. From Figure 10, all appliances are highly detectable when σ<5 min, with the exception of MO. Appliances such as MO have a very high rating (1.5 kW), but the operation duration is short (0.025 hours).
Hence, when the interval resolution increases, MO becomes challenging to detect.
Referring to Table 6, appliance detectability in the data down-sampling scheme has a high correlation with a minimum duration (0.72), followed by a rating (0.34).
Appliances with long operation durations require significant σ values to hide sensitive information. For instance, AC requires at least 1 h interval resolution to blind , and

Sensitivity of algorithms in a data down-sampling scheme
Similar to Section 5.2.2, four adversaries with different algorithms (CNN-LSTM, GRU, CNN, and KNN) are introduced to determine the sensitivity of algorithms in a data down-sampling scheme. As shown in Figure 12, the increase in σ substantially weakens the detectability of all four adversaries. It is essential to note that all adversaries still maintain a high inference ability when σ ranges from 15 to 30 min, while the sample frequencies of most smart meters in the UK are in this scope. This result demonstrates our argument that the current smart metering system in the UK is highly vulnerable and can be abused by . A benchmark of σ=10 h is a safe threshold for the privacypreserving model against the attack from .

Combined Effect of Interval Resolution and Aggregation Size
In this section, the combined effect of two parameters, α and σ, on the adversary computing ability is demonstrated. The aggregation size α and interval resolution σ are changed synchronously, and the dynamic variation of two privacy metrics, the Fmeasure and ρ, are observed. The simulation results are presented in Figure 13, which uses 3D models to show dynamic changes. The detectability recedes rapidly, and both the F-measure and ρ decrease to zero given > 10, and > 30 .

Discussion
Based on the simulation results and quantification of appliance detectability obtained in the previous sections, three-level privacy boundaries are concluded in Table 7. When < 20 or < 5ℎ, consumers are at privacy level I, which represents consumers under real-time surveillance at this level. By detecting appliance signatures of active loads (MO, DW, STO, and DRY), can have knowledge of detailed behaviour patterns of residents inside the house. When 20 ≤ < 40 or 5ℎ ≤ < 8ℎ, the consumers are at privacy level II, and can infer presence/absence information from intermittent loads (AC, EV, WH, and FUR) but cannot understand complex behaviours inside the house.
When 40 ≤ or 8ℎ ≤ , the consumers are at privacy level III; at this level, consumers are protected entirely and free of privacy concern. In addition, when we take the co-effects of the two parameters, the detectability of decreases dramatically compared to a single parameter. When 10 ≤ and 30 ≤ , privacy level III is already achieved.

Conclusion
In this paper, a privacy-preserving smart metering model is proposed; the model adopts a data aggregation scheme and data down-sampling scheme to better protect sensitive information from inference. An AI adversary is then introduced to quantify the privacy boundary (aggregation size and interval resolution) of the smart meter data. The adversary can implement cut edge CNN-LSTM NILM algorithms to detect appliance usage from the demand load curve and further identify the behaviour patterns of consumers. Three case studies are employed to investigate the influence of parameters α and σ and the co-effect of α and σ on the appliance 's detectability. From the simulation, three-level privacy boundaries are quantified, showing that to achieve Level III privacy (complete protection), the following conditions must be met: (1) 40 ≤ 8ℎ ≤ ; (2) 10≤α and 30 min≤σ.

Implications for Policy
The conclusion obtained in this paper, especially the three-level privacy boundaries, is fundamental to stakeholders in the smart metering system, including consumers, manufacturers, power system operators, and policymakers from the government. As privacy is abstract and hard to quantify, privacy boundaries are easily understandable and provide an insight for people to classify privacy-free and privacy-concerned smart meter data. New generation smart meters can make further improvements based on privacy boundaries. In addition, for smart meter data granularity under privacy boundaries, extra encryption techniques should be adopted by the utility to guarantee the safety of private information.

Future work
In the future, this work can be extended to the following directions: (1) a combination of the proposed smart metering system with encryption techniques would provide better security and privacy guarantees to consumers; and (2) continuous updating of the privacy boundaries by considering advanced NILM algorithms.