Save the Data! An Intelligent Approach to Avoid Data Loss

Abstract

Data loss can harm customers, business strategies and companies reputation. While enterprise environments commonly employ data replication technologies as RAID, small business and customers rely on the lifetime of their storage devices, mostly hard drives. Thus, as these hard drives fail, massive data losses may happen. When important data is at stake, being aware of possible disk fails is crucial. In this sense, hard disks use SMART technology to try to detect failures. These analysis, however, are carried out only when operational system requires or during boot process. Moreover, these predictions are not very accurate, presenting small accuracies and high false positive rates. To avoid such problems, we propose a machine learning approach to detect hard drive failures. We use a huge and recent dataset from Blackblaze. Decision trees achieved the best performance with 80% in accuracy rate and less than 12% in false positive rate in failure predictions.

Publication
XIV Encontro Nacional de Inteligência Artificial e Computacional