Paper Conference

Proceedings of uSim Conference 2020: 2nd uSim Conference of IBPSA-Scotland


Urban data imputation using multi-output multi-class classification

Pascal Schetelat, Lucie Lefort, Nicolas Delgado

Abstract: Bottom-up urban simulations tend to require more input data than there are available in practice. Even though geometric information system (GIS) descriptions of buildings, roads, networks and terrains are becoming widespread in certain part of the world, technical description of those urban objects (occupancy, structure, performances...) are either not collected, not published under open data licenses or only consist in limited size surveys. Considering the large number of simulated objects, manual collection is also impractical. A common approach to overcome this issue is to associate urban objects with a catalog of manually crafted archetypes. Nevertheless, those approaches introduce uncertainties that are hard to quantify. This paper proposes a new approach to the data imputation problem by reframing it as multioutput multi-class classification task. In the classification task, available urban data are treated as features used to predict unknown discrete quantities required by the simulation tools. Census and technical surveys are used as training data sets. Raw classification performance metrics of the multi-output multi-class task are evaluated using cross-validation on training data sets. Output classes dependencies are evaluated using mutual information. End to end performances are assessed by comparing the predicted building thermal properties to open data thermosensitivity. Two machine learning and statistical models are evaluated as potential classifiers. The approach is demonstrated on predicting missing data of French buildings. GIS and description data are treated as input features, the French population census and the Phebus survey are used as training datasets.
Pages: 126 - 133