Monday, May 8, 2017

Analyzing Predictions to find *bugs*

Testing should be one of the most frequent steps in the SDLC for any code, whether it is building a website, powering a smart device or machine learning. Tests of machine learning code may reveal lots of issues - transformations that don't work because the right columns are not operated on or data which fails assumptions of the learning algorithm. The list may be endless and we are well attuned to this process. One of the tests often ignored is comparing the test results during the training period with actual values from the data and then trying to find what went wrong and where. This can often reveal attributes or column values that you may have ignored till now and must play a bigger role.

For example, use the following function to plot test results against the real values. The plot should be a straight line. Any deviations from it are to be analyzed for every column.

No comments: