Fundamentally, each ML study predicts some property and comprises three elements: a database, a descriptor, and an ML algorithm. These are combined in two steps. First, the data representation is calculated using the descriptor. Then the model is iteratively evaluated on this representation or adjusted to improve it. Both processes are nearly instantaneous compared to ab-initio based methods; however, with extensive databases or materials modeled with large super-cells (e.g., glasses), times can grow into days or years. We present a tool that can speed up total process orders of magnitude by removing the most time-intensive step, i.e., the descriptor calculation.
To accomplish that, we move from traditional sharing of only the material-properties data to sharing of the descriptors-properties data corresponding to the material as well, employing a NoSQL MongoDB database. This change not only enables orders-of-magnitude faster and effortless machine learning of materials but also serves as a tool for an automated and robust embodiment of prior knowledge about them in a graph-like fashion. Furthermore, since the descriptors are often reused for related properties, our database provides a tremendous speed-up in the design space exploration.