Google is committed to proactively detecting software defects in key open source projects. But now it has become known that the company is also developing a SiliFuzz system that will detect defects in processors.
How SiliFuzz works is to analyze processor performance by executing pre-prepared test data collected using emulators. This is one of the types of phasing – the processor is loaded with “random” calculations, the result of which is checked at the output. If there is a mismatch, the processor is considered defective.
The system is primarily designed to detect electrical defects in microcircuits that may arise during production, assembly, workflow, etc. Particular attention is paid to them, and not to logical errors in the processors themselves. At the same time, the tests in question do not use any low-level debugging mechanisms, which allows them to be used in “live” systems.
Basically, the challenge for developers is to create a system that can regularly test each core of every Google server with minimal impact on its performance. In its current form, SiliFuzz picks a point in time when the load on a particular machine is not too high, and sequentially tests groups of four threads (2 cores with SMT) in no more than two minutes. Currently, developers are focusing on x86-64 processors, which are widely used by Google itself.
The main goal of the project is to automate the detection of hidden defects that lead to miscalculations, which are much more dangerous than simple failures and accidents, since only small deviations in the operation of the chip lead to the accumulation of a whole array of errors. In some cases, the difference was less than 0.0000003%, but this may be enough to cause serious problems.
About 45% of defects found with SiliFuzz are not tracked by other tools. In the future, the developers plan to expand SiliFuzz, increase the speed of the program and generally improve the quality of its work.