Scrabble: Transferrable Semi-Automated Semantic Metadata Normalization using Intermediate Representation


Interoperability in the Internet of Things relies on a common data model that captures the necessary semantics for vendor independent application development and data exchange. However, traditional systems such as those in building management are vertically integrated and do not use a standard schema. A typical building can consist of thousands of data points. Third party vendors who seek to deploy applications like fault diagnosis need to manually map the building information into a common schema. This mapping process requires deep domain expertise and a detailed understanding of intricacies of each building’s system. Our framework - Scrabble - reduces the mapping effort significantly by using a multi-stage active learning mechanism that exploits the structure present in a standard schema and learns from buildings that have already been mapped to the schema. Scrabble uses conditional random fields with transfer learning to represent unstructured building information in a reusable intermediate representation. This reusable representation is mapped to the schema using a multilayer perceptron. Our novel semantic model based active learning mechanism requires only minimal input from domain experts to interpret esoteric, idiosyncratic data points. We have evaluated Scrabble on five buildings with thousands of different entities and our method outperforms prior work by 59%/162% higher Accuracy/Macro-averaged-F1 in a building when 10 examples are provided by an expert in both cases. Scrabble achieves 99% Accuracy with 100-160 examples for buildings with thousands of points while the other baselines cannot.

Proceedings of the 5th ACM International Conference on Systems for Energy-Efficient Built Environments, 2018