Sunday, May 6, 2018

The real impact of the All of Us Program

Today, May 6, happens to be the national launch day for NIH's All of Us program. This is a program that focuses on precision medicine - an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person (definition from NIH).

Precision medicine is not a new concept. If you google for it, you can find several documented cases, especially in cancer treatment where genomic information of a single patient was used to specifically target his or her disease successfully. The Director of the All of Us program, Eric Dishman happens to be such a case and is the kind of leader the program needs to make sure everyone can make use of it.

Using the genomic data to precisely understand and define medical treatment for an individual seems logical. But how can we generalize this and make its reach wider? How can the treatment of one person at one end of the world have a positive impact on a patient at another end? How can we connect the dots? Can we broker the innovation produced by Facebook to socially engineer the ad targeting into our own healthcare? Can we target medicines or health treatments to an audience instead of ads?

The answer is of course a resounding YES. It is resounding enough to reveal that All of Us is not the first program to attempt to do so. So why would it be more successful than other data collection programs? It has many good things going on for it. Besides large national funding and a passionate team that is behind it, it attempts to engage community, capture diversity and use all available technology.

As I wrote in my previous article, the power of using data to help complement today's medicine is huge. The All of Us program extends the scope of this power to come from a much wider data source. This is the data collected from a lot of people and a lot of types of data besides just genomic data. It targets not just patients but also healthy people. So why is all of this necessary if we can just take someone's genome and use it to tweak their treatment. Think of what Facebook does by collecting large amounts of data from a large set of people having diverse behavior. Think of how Alexa learns to do some smart things by learning from questions asked by a large number of people. Perhaps a quick lesson from machine learning would be useful here.

Machine learning from data involves looking for patterns when trying to see why certain choices or events or input parameters led to a set of outcomes. Finding the patterns needs diversity of data, else machine learning can go very very wrong. Data which is skewed towards certain outcomes can lead to very poor models. In addition, the presence of redundant inputs or the absence of key inputs can lead to poor models too. Having too little data is also not helpful and can lead to failed models in real world. Data scientists often combine the results of many models that are based on many sets of data to build a more real world robust model. 

That is precisely why the program attempts to collect a diverse set of data from a diverse set of healthy and non-healthy individuals and make it available to a diverse set of researchers. This could fundamentally change how medicine works in a few years. Think of how much machine learning technologies have impacted our day to day work in the last ten years. And how cheap it has become to research and deliver the benefits! Very simple health related outcomes could be achieved in the next couple of years itself. More complex cloud based solutions that use this data to buy health benefit programs "as a service" could evolve in few years!

Saturday, May 5, 2018

So really what is the All of Us Research Program?

Saturday today and I was stopped in my tracks to finish my chores several times. Tomorrow (Sunday) is the big launch. The big launch of what, they asked? They were messaging me all over the place on social media. Just google it, I said.

It finally is here - the big day, May 6 where NIH is engaged in the national launch of the program that has been in gestation so far for several months with thousands of beta participants. The program aims to enroll one million participants to volunteer their data and other information so researchers can use it to improve medicine.

I had no idea of this program until the middle of last year when I came across Vibrent (based in NoVA) and met its CEO, Dr Praduman Jain who later went on to explain the benefits and goals of Precision Medicine very passionately to me. And then it finally dawned on me. 

Well that was then. Fast forward almost four months and here we are. We have been so busy building up the functionality and infrastructure for the program that it has been all to easy to forget what the program is about!

A few months ago, I read an article by Chris Anderson, former editor in chief of Wired magazine, that I thought applied very well here: “The end of theory: the data deluge makes the scientific method obsolete” ( As the title indicates, Anderson asserted that in the era of petabyte information and supercomputing, the traditional, hypothesis‐driven scientific method would become obsolete. No more theories or hypotheses, no more discussions whether the experimental results refute or support the original hypotheses. In this new era, what counts are sophisticated algorithms and statistical tools to sift through a massive amount of data to find information that could be turned into knowledge (source: NIH).

The All of Us Research Program can be instrumental in this process. Today, scientific research happens in the laboratory. A sample or specimen is observed, tests developed and then trials are done before getting approvals and mass rollout. This is still the wide majority of cases. Behavioral scientists have obviously been using data for their research regularly and there are few cases where actual patient data was used to build out a specific medicine formula that was then given to them. The All of Us Research Program has the potential not only to lower the costs of research in medicine but also to increase the efficacy of medicine.

Consider the long tail problem in developing medicines. The All of Us Research program can have a huge impact on solving this issue. If I was to give an analogy, this program has the ability to do what fiber optic cables did for the internet. I am not talking about the stock market crash of 1999 ! I am referring to the immense large possibilities of breaking down and then solving a very complex and large problem. Imagine what kind of collaboration you could get if this kind of data is available across national boundaries and what if it can be used and compared on demand by researchers all over the globe? This could completely democratize the problem of finding cures for diseases and help develop a very competitive market for medicines in the next 20-30 years. Hopefully that happens in my lifetime!

PS: This blog describes my person viewpoint on the All of Us Program. For the official definition or for signing up for the program, please visit the NIH Join All of Us website or download the apps from the App Store or Play Store. If you are interested in the crazy engineering that goes behind building this, join Vibrent Health. We are on the lookout for passionate (and crazy) engineers that want to change the world.

Tuesday, July 18, 2017

Can our payment institutions innovate?

Banks earning season is rolling in with Q2/2017 earnings. Most major banks including Bank of America, Wells Fargo, Goldman Sachs etc have lowered their guidance for the rest of the year citing lots of reasons - trading, market making, interest rates etc. Mortgage and lending business is also under pressure it seems. The same time Netflix reported record consumer growth in Q2/2017 while also suggesting strong negative free cash flows for the suggestible future. The market was happy at that and sent the stock surging. Its just part of doing business, right? That just sounds too bad for U.S banks which are considered fundamental pillars of the economy. Fundamental enough that taxpayers bailed them out few years back but still not allowed to innovate.

Bitter truth is that innovation in core parts of the economy can make the economy unstable. Innovation implies risk and risk can materialize once in a while. So we allow innovation to happen on the fringes of the economy. Fringe enough that if we loose it, not many people would notice.

Back to FinTech or Innovation in financial services. In the U.S/EU FinTech is helping evolve micro-lending and online banking industries. The pace is much more rapid in the developing parts of the world though, like Africa. However, we may be surprised to hear that U.S is a laggard when it comes to financial innovation. The reason is not that innovators in the U.S are not smart enough - but they simply focus on other important problems like talking robots, space travel and auto driving cars.

So what is going on with FinTech that should worry us? Here is a news flash - China is fast becoming the new FinTech capital of the world, taking that title away from the western world. The transactions volumes processed by Baidu, Tencent and Alibaba are growing so fast that they are set to pass the volumes Visa and Mastercard process in Europe and US in 2017. China's mobile payment system is does not use any high-technology though. It uses bar codes to process payments! Thats way inferior and insecure than the system used by Mastercard and Visa (EMVCO) or by Google/Applepay. One thing it gets right though - its fast, convenient and cheap and thats what matters. It has connected about a billion people in China who never had any bank accounts or credit history before. So why should we be worried? Chinese companies are slowly importing this system into U.S and EU through partnerships and deals with processors like Stripe and companies like AirBnB etc. They are so big and powerful that U.S companies have little choice accommodating them. That would slowly cut into the western financial system and would cause a huge disruption in the economies in the next ten years.

What can we learn from the success of the Chinese system? First, it processes transactions without communicating with the banks each time. If the amount of the transaction has a limit and the replication is fast enough, this should work. The result is that this makes it very fast for high volume processing.  It is cheap and convenient because there is no need to have expensive NFC hardware or scanning devices or chips/cards. Compare that with the our system - there are two banks involved. We use a dedicated transmission network for the payments and there are middlemen involved. More middlemen mean more fees and commissions including the need to keep reconciling data as it passes all the intermediate systems. Chip cards make things even slower. Evolution in our system is not going to be easy since the credit card companies have a tight grip on end users with rewards and points systems.

What can save us from certain failure is the need for innovation - the U.S still has around 50 million people without bank accounts and thus no credit cards. PayNearMe tapped into this market by offering an easy way to make cash based payments where credit cards were not an option. So who is going to get to it first!

How social networking combined with NLP analytics is helping expand the economy

Recently I have been researching the rise of NLP again. This was the topic of my bachelor thesis in 1995, almost 20 years ago and it has now become a hot area of research again in the last 5 years. The science and tools have evolved and a lot of new open source tools like NLTK are available for researchers.

Clearly the early users of social networking data were doing a lot of sentiment analysis on it to determine trends for companies, products, politics etc. Things have changed now - governments are interested in scouring billions of bytes of data generated daily in social networks for intelligence hints, Fin-Tech upstarts are starting to successfully use the same data to disrupt financial services - for example, the lending industry. 

The sale of to Airbnb spiked my interest into this. Research revealed the existence of a whole bunch of companies and existing patents in the science of using social network data to determine a person's trust or risk score. Appears that lot of tricks have evolved in the last five years. However, this is quite scary as well! In the next few years, I can see a lot of people trying to use this score instead of just pulling the credit scores in business transactions. This could be landlords, people selling goods on craigslist (e.g cars etc), small business engaging in seasonal hiring to keep costs low. It could certainly replace simple background checks. While this is all great, the scary part is that this presents a Kafkian situation for people at the other end of it. Unlike their FICO scores, there is a lot less visibility into how the machine learning algorithms work. Imagine a customer service rep trying to explain why your social risk was considered too high when renting an apartment for a few days on AirBnB.

This led me to findings how the big Chinese payment companies (Ant Financial, Baidu) are now planning to use all the data that they collect from repeated payment transactions and the social networks they own to determine credit scores of people. This idea is definitely not new but it is breaking new ground in the last couple of years. For example, companies like HelloSoda, Guardian Analytics have been doing risk scoring for quite some time now. There are also a lot of banking and lending upstarts in U.S and EU - Kabbage, Simple, Moven, Fidor etc. However, FinTech has taken social networking data to a new level now and this time its not just to send marketing and sales offers (like the famous American Express examples show) or to make your banking very cool indeed. 

There are multiple innovation areas - companies like Jumo have made a significant penetration in the underbanked markets of Africa to enable micro-lending based on social networking data that they collect. In China, Baidu already has more than 10% of its assets involved in some kind of lending, the biggest being in the education market. Now Alibaba and Tencent are also getting involved. Not only is the mobile payments market in China set to pass the transactions Visa and Mastercard process but it is also spinning out new uses of the data collected. In one way it looks like social networking companies may have an edge over managing risk and may have slowly started to disintermediate traditional lending companies. There are 800 million people using Tencent for payments that have no credit history with the central bank and the daily transaction patterns reflect a lot about their behavior and risk. Take a look at AutoGravity - users can not only select cars, schedule a test drive but once they connect it to their social networking account, the site also prefills the application, verifies identity and lines up four lending companies without having the user fill up scores of forms or go to an office. All on the mobile phone in minutes. Facebook, Google and Apple could be doing this next year. It all starts by focusing on an underserved market and there are enough in financial services - there are almost 64M Americans without sufficient credit history and almost 2B people around the world without a bank account.

Thursday, July 6, 2017

Learning TensorFlow

TensorFlow is Google's deep learning library, released in 2015. In many ways, it may be puzzling why we should pay attention to it given there are so many machine learning frameworks around that seem to be doing a pretty good job so far!

The following should provide a good motivation:

- TensorFlow supports GPUs.
- TensorFlow supports distributed computation
- Primarily TensorFlow is good for deep learning. Lets just say it seems to be much more focused on DL.

Note that TensorFlow is equivalent to the numpy module in python. There is a lot of development still going on and hopefully easy to use libraries like scikit-learn will be available soon. One may also ask that Apache Spark provides distributed computation, has an ML library and supports GPUs as well. So why not just use Spark by itself? The answer may be that Spark is not focused on DL as much as TF is. Moreover the distributed computation model of Spark is very different from TensorFlow. Spark has a resource manager hidden from the user that parallelizes an RDD computation over a cluster. TensorFlow distributed programming involves the user and the program has a lot more control on the computation. IMO, Spark may sit in the data pipeline ahead of TensorFlow to massage/clean and process data that is used to train a very large neural network. At this point, TensorFlow needs a considerable simplification of its cluster management and programming API before it can be used by data scientists used to working with tools like numpy/R or Spark.

Here are some good talks and links to understand TensorFlow better: