Thursday, July 6, 2017

Learning TensorFlow

TensorFlow is Google's deep learning library, released in 2015. In many ways, it may be puzzling why we should pay attention to it given there are so many machine learning frameworks around that seem to be doing a pretty good job so far!

The following should provide a good motivation:

- TensorFlow supports GPUs.
- TensorFlow supports distributed computation
- Primarily TensorFlow is good for deep learning. Lets just say it seems to be much more focused on DL.

Note that TensorFlow is equivalent to the numpy module in python. There is a lot of development still going on and hopefully easy to use libraries like scikit-learn will be available soon. One may also ask that Apache Spark provides distributed computation, has an ML library and supports GPUs as well. So why not just use Spark by itself? The answer may be that Spark is not focused on DL as much as TF is. Moreover the distributed computation model of Spark is very different from TensorFlow. Spark has a resource manager hidden from the user that parallelizes an RDD computation over a cluster. TensorFlow distributed programming involves the user and the program has a lot more control on the computation. IMO, Spark may sit in the data pipeline ahead of TensorFlow to massage/clean and process data that is used to train a very large neural network. At this point, TensorFlow needs a considerable simplification of its cluster management and programming API before it can be used by data scientists used to working with tools like numpy/R or Spark.

Here are some good talks and links to understand TensorFlow better:






No comments: