The optimization of the development of the transmission grid is a trade-off between minimizing investments and maximizing operational savings. In this context of planning expansion, a few snapshots of the power system, defined manually by experts, are usually considered to estimate the operational cost over the year. They are selected to describe the worst case scenario and to reflect the system critical states based on the variations of the load and intermittent generation, which can lead to over-investment. Besides, a too small set of snapshots may misguide the investment due to insufficient representativeness of the whole year, especially for systems with a high penetration of renewables. Considering all time-steps in the optimization process (a full year of data) would be an ideal solution but would lead to an unaffordable complexity. A selection of a number of representative snapshots appears as a reasonable compromise, especially in large networks considering several time-horizons and multiple energy scenarios.
The snapshot selection technique described here is built upon a classical clustering algorithm (K-means) that includes a few statistical feature1 independent from the number of nodes in the studied network. As a result, the complexity of the snapshot selection is thus drastically reduced for large networks.
Snapshots selection methods through clustering have been used for studies in security of supply , demand analysis  or energy prices , but their applications in TEP remain rare.
The purpose of the method is to organize the data in a number of clusters in which objects present a high degree of similarity, the number of clusters being exactly the target number of snapshots. In each cluster, the most representative point (the so-called medoid) will minimize an error function summing the distance of all points to that representative.
2. Comparison Methodology
The performance of the clustering algorithm is assessed based on a series of indicators3 playing with two parameters: the number of clusters and the different choices of features (they are indeed different in terms of nature – price differences of electricity or non-controllable demand and generation – and in terms of values – local or statistical). The comparison includes the obtained snapshots partitions and the solutions of the TEP optimizations when using either local or statistical features for snapshot clustering.
Since the clustering algorithm is randomly initialized, the snapshot selection is performed 50 times for each value of number of clusters (K), then the candidate selection and the TEP optimization4 are run for each obtained partition and indicators are averaged on those runs.
When comparing local vs. statistical features, the partition obtained with local features is considered the reference partition; the set of candidates selected with local features is the reference pool and the result of TEP optimization obtained on this reference partition is the reference investment solution. ARI would then measure the accuracy of the “statistical partition”, CSI would capture the accuracy of the “statistical candidates set” and ISI would measure the accuracy of the investment solution obtained on the statistical partition.
Endly, a sensitivity of the expansion solution (obtained through TEP optimization) to the number of representative snapshots is carried out considering the expansion solution without snapshot selection (i.e. including 8760 snapshots of the year) as the reference solution for all the features. It is based on the computation of ISI between expansion solutions with and without snapshot selection for each feature and for a different number of representative snapshots.
3. Test Case and Results
It is based on the power system proposed by Garver made of five existing nodes and one new node (with no demand) to model a new production site (e.g. offshore wind). Some transfer capacities are obviously missing to trigger the need for building new lines. Three injection types are possible at each node: non-controllable demand, non-controllable generation and controllable generation.
Accuracy of the statistical features
It was measured in the test case both on “price differences” and on the “hybrid feature” (which combines uncontrollable generation/demand with nodal price).
Statistical partitions (measured by ARI)
- A 12% difference is observed between local and statistical partitions for the “price differences” feature (ARI 0.88 for a number of representative snapshots above 15)
- A very low ARI for the hybrid feature means that different partitions of the snapshots will be obtained when replacing the local information by a statistical information, the difference seeming to increase with the increasing number of clusters.
Statistical candidates sets (measured by CSI)
- The similarity of selected candidates between local and statistical features increases with the number of representative snapshots.
- For the “price difference” feature, a number of 20 representative snapshots are needed to have at least 80% common candidates
- For the hybrid feature, at least 40 representative snapshots are needed to reach that level of 80% of common candidates.
Investment Solution on the statistical partition (measured by ISI)
- For both features (“price differences” and “hybrid”), the agreement error obtained is lower than 15% for a number of representative snapshots above 35, the ISI index reaching 90% above 45 representative snapshots: see Fig.1.
- Thus the investments made for this test case will be very similar using statistical or local features when clustering snapshots, even when the selection of snapshots results in different clusters (for the hybrid feature as mentioned above).
Figure 1. ISI when comparing local and statistical clustering features
Sensitivity of the expansion solution to the number of snapshots
The results obtained on the test case are expressed in ISI values, function of the number of snapshots (K).
- The ISI value between the reference expansion solution and an expansion solution based on random selected snapshots (worst case) was calculated at 0.59
- For the local features most of the ISI values are above 0.99
- Unlikely the expansion solutions using statistical features are very different as shown in Fig.2.
- ISI values for the “price differences” feature (blue line) present a stable trend while results for the “hybrid” feature (red line) seem to be chaotic. However they are above an ISI value of 0.85 which is still much higher than the ISI value obtained from randomly chosen snapshots (ISI=0.59)
Figure 2. ISI when comparing statistical clustering features with reference
On the six-zone test case, the use of statistical values instead of local values as clustering features seems to be valid from the investment point of view since the local and the statistical expansion solutions lead to about 85% common candidates. The sensitivity analysis comparing expansion solutions for different values of K to the reference solution (considering all the snapshots) shows very positive results regarding the statistical price-differences feature.
The above analysis leads to the same conclusion: the statistical price-differences feature shows good results; expansion solutions using this “price difference” feature are very close to the one without snapshot selection for values of K higher than 15. This feature uses 4 values (instead of the number of pairs of nodes in N.(N-1)/2) to describe the snapshots and hence should reduce the complexity of the clustering.
Further studies could include:
- the analysis of the correlation of the proposed clustering features (and other potential ones) with the OPEX or with the output of the TEP optimization;
- the ex-ante estimation of an appropriate number of clusters by analyzing the data to be clusters methods.
This work is part of the enhanced network expansion planning methodology  of the e-Highway 2050 project, supported by the EU Seventh Framework Programme, and is connected to the following e-Highway2050 knowledge articles:
 H. Kile, “Evaluation and Grouping of Power Market Scenarios in Security of Electricity Supply Analysis,” 2014.
 R. Green, I. Staffell, and N. Vasilakos, “Divide and Conquer? K-Means Clustering of Demand Data Allows Rapid and Accurate Simulations of the British Electricity System,” IEEE Transactions on Engineering Management, vol. 61, no. 2, pp. 251–260, May 2014.
 F. Martínez-Álvarez, A. Troncoso, J. C. Riquelme, and J. M. Riquelme, “Partitioning-Clustering Techniques Applied to the Electricity Price Time Series,” in Intelligent Data Engineering and Automated Learning - IDEAL 2007, H. Yin, P. Tino, E. Corchado, W. Byrne, and X. Yao, Eds. Springer Berlin Heidelberg, 2007, pp. 990–999.
 S. Lumbreras, A. Ramos, and P. Sánchez, “Automatic selection of candidate investments for Transmission Expansion Planning,” Int. J. Electr. Power Energy Syst., vol. 59, pp. 130–140, Jul. 2014.
 L. Hubert and P. Arabie, “Comparing partitions,” J. Classif., vol. 2, no. 1, pp. 193–218, Dec. 1985.
 L. L. Garver, “Transmission Network Estimation Using Linear Programming,” IEEE Trans. Power Appar. Syst., vol. PAS-89, no. 7, pp. 1688–1697, Sep. 1970
 S. Agapoff, C.Pache, P. Panciatici, L. Warland, S.Lumbreras “Snapshot selection based on statistical clustering for transmission expansion planning”, Powertech, 2015
- Sergeï Agapoff, Camille Pache, Patrick Panciatici, RTE, Versailles, France, e-mail: email@example.com
- Leif Warland, SINTEF Energy, Trondheim, Norway, email: firstname.lastname@example.org
- Sara Lumbreras, Institute for Research in Technology, Universidad Pontificia Comillas, Madrid, Spain, email: email@example.com