Tukeys Blog: Analyzing Predictions to find *bugs*

Monday, May 8, 2017

Analyzing Predictions to find bugs

Testing should be one of the most frequent steps in the SDLC for any code, whether it is building a website, powering a smart device or machine learning. Tests of machine learning code may reveal lots of issues - transformations that don't work because the right columns are not operated on or data which fails assumptions of the learning algorithm. The list may be endless and we are well attuned to this process. One of the tests often ignored is comparing the test results during the training period with actual values from the data and then trying to find what went wrong and where. This can often reveal attributes or column values that you may have ignored till now and must play a bigger role.

For example, use the following function to plot test results against the real values. The plot should be a straight line. Any deviations from it are to be analyzed for every column.

	def pltpredictions(real, predictions):
	plt.plot(real, predictions, '.')
	plt.title("Analyzing our predictions")
	plt.xlabel("Real Prices")
	plt.ylabel("Predicted Prices")
	plt.show()

	#sample code snip to plot predictions against real values
	pred_train, pred_test, tar_train, tar_test = train_test_split(predictors, targets, test_size=.1)
	y_pred = clf.predict(pred_test)
	pltpredictions(tar_test, y_pred)

	#put the predicted values and real values in a spreadsheet and add an extra column which compares
	#the difference between real and predicted. Then sort by this difference. Review the records in this file to identify hidden bugs
	#in your model.

	train_predictions_df = pd.DataFrame({'Prediction': y_pred, 'Real': tar_test, 'Diff':y_pred - tar_test,
	'Percentage': 100*abs(y_pred - tar_test)/tar_test}, index = pred_test.index)
	train_predictions_df.sort_values('Percentage', inplace=True, ascending=False)
	train_predictions_df.to_csv('../data/HousePrices_train_predictions.csv')

view raw analyze_predictions.py hosted with ❤ by GitHub

Monday, May 8, 2017

Analyzing Predictions to find *bugs*

No comments:

Analyzing Predictions to find bugs