Tukeys Blog

Friday, February 2, 2024

The conundrum with Generative AI

Generative AI presents very big opportunities but also carries many risks. We review the current risk mitigation strategies and explore where we could go next as an industry where all players must play their part to achieve the best potential of this technology.

Benefits of Generative AI

Since 2022, there has been a 1000% increase in the use of Generative AI content on the internet. Some say the AI could be the biggest game changer in our fight against hunger and poverty while others claim that it could widen the gap between rich and poor. Time will tell and with AI, time moves faster than other technologies we have experienced before - in just a few months, many industries have already embraced Generative AI at different depths to introduce enhanced products. For example:

1. Using publicly available pre-trained AI Models: Grammarly, SciSummary and Legalyze are some of the companies using Generative AI to enhance their products with better interfaces and newer capabilities.

2. Self-trained AI Models: BloombergGPT , Meta’s SeamlessM4T, Tabnine are some of the examples where companies have spent a lot more resources to train generative AI models on their own custom corpus to build tools which function better in their domains.

3. Generative AI for Audio, Video and Imagery: Meta’s Audiocraft, Eleven Lab’s Dubbing Studio, and Video generators from Synthesia and Deepbrain are some examples.

Risks

Clearly, Generative AI has numerous benefits. So, what is the conundrum here? While Generative AI can solve many hard problems, it also presents a clear and present danger to our society and way of life: Generative AI has been become the killer app for spammers, fraudsters, and misinformation spreaders. It has become a winning tool for politicians, especially where winning is important but extremely difficult or expensive due to regulatory environment. Several recent examples have emerged:

1. An AI generated voice recording on Facebook was used to disrupted elections in Slovakia.

2. Politicians in Argentina used AI tools like Midjourney to generate fake images of rivals to influence voters. Politicians also claimed to be victimized by attackers that used Generative AI when in fact the information was as real and credible as it could be.

3. Robocalls using an AI voice resembling President Joe Biden urged democratic primary voters to skip.

Current Solutions

Social media platforms have resorted to a mix of solutions ranging from automated detection tools to crowdsourced content moderation and fact checkers. For example:

1. Meta uses fact checking organizations like Demagog and StopFake

2. X uses CommunityNotes, an approved set of volunteers.

However, research shows these solutions have much lower efficacy in the political domain. First, the speed of response at the required scale does not match the viral speed at which the attack spreads. Second, seeking out consensus across divergent perspectives in a polarized political environment can be a daunting challenge for fact checkers too. The links I provided in the above examples in fact indicate many such limitations.

How do we address the gaps in 2024?

What can we do to address the possibility that there could be much more magnified versions of these attacks that may undermine political events in 2024?

The approach to solving this must be multi-pronged.

Step 1: Improved legislative stance against tech and media companies to increase ethical accountability.

The general technology for Generative AI is mostly open sourced and well known in the research community. However, tech and media companies are the dominant enablers of such technology to the masses. By making it harder for the tech companies to ignore their ethical responsibilities, we can build a strong foundation for a comprehensive solution. The legislative actions taken in Europe, US, Canada and many other countries on AI Governance, Risk Framework and Compliance are very good examples of how this is being achieved.

Step 2: Use technology to ensure fast and accurate detection of Generative AI in digital content.

Generative AI content is mostly produced and consumed inside applications and devices that have capabilities to mark and verify that content. Examples:

1. Applications that generate media like photoshop, Audiocraft and Synthesia.

2. Social networks that host media like Facebook, Instagram and X.

3. Applications that consume media like Web Browsers, Youtube and other mobile apps

The next step would be to improve how these applications can secure digital content and alert recipients as quickly and reliably as possible to the presence of generative AI. Here we can use at least three techniques:

1. Watermarking: Watermarking involves hiding a label, sound or text inside the media and obscuring it so it is hard to detect. Such techniques are much more effective with images and video than text and can be used in applications that generate or modify media. There is absolutely no harm in generating watermarks within these applications. However, since generative AI is mostly open source, hackers and spammers can circumvent watermarking by using alternate services/training their own models.

2. Pattern Detection: Tools like SynthID and GPTZero use pattern detection to identify generative AI content like text and images. Such tools can be integrated in Social Networks and Browsers. Care must be taken to constantly improve these tools and account for errors as Generative AI will continue to evolve and become better and evading detection.

3. Coalition for Content Provenance and Authenticity (C2PA): C2PA is the most promising option as it presents a fast, reliable, and yet accurate way compared to other options. C2PA can be used to add cryptographic information to detect the tampering of media files and streams. C2PA is an open standard that focuses on providing the history and context for any digital media. This technology is very similar to X.509 certificates used by websites and payment services. Just like a certificate owned by a website allows a browser to verify the authenticity of the website and its owner, C2PA helps consumers identify who created a digital file and its complete modification history. Examples of applications that already implement C2PA include Adobe Photoshop and a Chrome extension released by Digimarc that can help consumers check images. C2PA can also be used along with blockchain technologies to certify the authenticity of images for news organizations.

C2PA has built-in safeguards (including PKI-based digital signatures) to ensure that the authorship or origin information is accurate and can’t be faked or falsified. C2PA has been available as early as 2022 and but its adoption is still very weak. Government intervention may be an option to accelerate adoption across the industry.

In conclusion, while Generative AI presents extraordinary benefits, it is imperative to address its associated risks through comprehensive strategies involving technology and policy in addition to just public awareness.

Sunday, May 6, 2018

The real impact of the All of Us Program

Today, May 6, happens to be the national launch day for NIH's All of Us program. This is a program that focuses on precision medicine - an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person (definition from NIH).

Precision medicine is not a new concept. If you google for it, you can find several documented cases, especially in cancer treatment where genomic information of a single patient was used to specifically target his or her disease successfully. The Director of the All of Us program, Eric Dishman happens to be such a case and is the kind of leader the program needs to make sure everyone can make use of it.

Using the genomic data to precisely understand and define medical treatment for an individual seems logical. But how can we generalize this and make its reach wider? How can the treatment of one person at one end of the world have a positive impact on a patient at another end? How can we connect the dots? Can we broker the innovation produced by Facebook to socially engineer the ad targeting into our own healthcare? Can we target medicines or health treatments to an audience instead of ads?

The answer is of course a resounding YES. It is resounding enough to reveal that All of Us is not the first program to attempt to do so. So why would it be more successful than other data collection programs? It has many good things going on for it. Besides large national funding and a passionate team that is behind it, it attempts to engage community, capture diversity and use all available technology.

As I wrote in my previous article, the power of using data to help complement today's medicine is huge. The All of Us program extends the scope of this power to come from a much wider data source. This is the data collected from a lot of people and a lot of types of data besides just genomic data. It targets not just patients but also healthy people. So why is all of this necessary if we can just take someone's genome and use it to tweak their treatment. Think of what Facebook does by collecting large amounts of data from a large set of people having diverse behavior. Think of how Alexa learns to do some smart things by learning from questions asked by a large number of people. Perhaps a quick lesson from machine learning would be useful here.

Machine learning from data involves looking for patterns when trying to see why certain choices or events or input parameters led to a set of outcomes. Finding the patterns needs diversity of data, else machine learning can go very very wrong. Data which is skewed towards certain outcomes can lead to very poor models. In addition, the presence of redundant inputs or the absence of key inputs can lead to poor models too. Having too little data is also not helpful and can lead to failed models in real world. Data scientists often combine the results of many models that are based on many sets of data to build a more real world robust model.

That is precisely why the program attempts to collect a diverse set of data from a diverse set of healthy and non-healthy individuals and make it available to a diverse set of researchers. This could fundamentally change how medicine works in a few years. Think of how much machine learning technologies have impacted our day to day work in the last ten years. And how cheap it has become to research and deliver the benefits! Very simple health related outcomes could be achieved in the next couple of years itself. More complex cloud based solutions that use this data to buy health benefit programs "as a service" could evolve in few years!

Saturday, May 5, 2018

So really what is the All of Us Research Program?

Saturday today and I was stopped in my tracks to finish my chores several times. Tomorrow (Sunday) is the big launch. The big launch of what, they asked? They were messaging me all over the place on social media. Just google it, I said.

It finally is here - the big day, May 6 where NIH is engaged in the national launch of the program that has been in gestation so far for several months with thousands of beta participants. The program aims to enroll one million participants to volunteer their data and other information so researchers can use it to improve medicine.

I had no idea of this program until the middle of last year when I came across Vibrent (based in NoVA) and met its CEO, Dr Praduman Jain who later went on to explain the benefits and goals of Precision Medicine very passionately to me. And then it finally dawned on me.

Well that was then. Fast forward almost four months and here we are. We have been so busy building up the functionality and infrastructure for the program that it has been all to easy to forget what the program is about!

A few months ago, I read an article by Chris Anderson, former editor in chief of Wired magazine, that I thought applied very well here: “The end of theory: the data deluge makes the scientific method obsolete” (http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory/). As the title indicates, Anderson asserted that in the era of petabyte information and supercomputing, the traditional, hypothesis‐driven scientific method would become obsolete. No more theories or hypotheses, no more discussions whether the experimental results refute or support the original hypotheses. In this new era, what counts are sophisticated algorithms and statistical tools to sift through a massive amount of data to find information that could be turned into knowledge (source: NIH).

The All of Us Research Program can be instrumental in this process. Today, scientific research happens in the laboratory. A sample or specimen is observed, tests developed and then trials are done before getting approvals and mass rollout. This is still the wide majority of cases. Behavioral scientists have obviously been using data for their research regularly and there are few cases where actual patient data was used to build out a specific medicine formula that was then given to them. The All of Us Research Program has the potential not only to lower the costs of research in medicine but also to increase the efficacy of medicine.

Consider the long tail problem in developing medicines. The All of Us Research program can have a huge impact on solving this issue. If I was to give an analogy, this program has the ability to do what fiber optic cables did for the internet. I am not talking about the stock market crash of 1999 ! I am referring to the immense large possibilities of breaking down and then solving a very complex and large problem. Imagine what kind of collaboration you could get if this kind of data is available across national boundaries and what if it can be used and compared on demand by researchers all over the globe? This could completely democratize the problem of finding cures for diseases and help develop a very competitive market for medicines in the next 20-30 years. Hopefully that happens in my lifetime!

PS: This blog describes my person viewpoint on the All of Us Program. For the official definition or for signing up for the program, please visit the NIH Join All of Us website or download the apps from the App Store or Play Store. If you are interested in the crazy engineering that goes behind building this, join Vibrent Health. We are on the lookout for passionate (and crazy) engineers that want to change the world.

Tuesday, July 18, 2017

Can our payment institutions innovate?

Banks earning season is rolling in with Q2/2017 earnings. Most major banks including Bank of America, Wells Fargo, Goldman Sachs etc have lowered their guidance for the rest of the year citing lots of reasons - trading, market making, interest rates etc. Mortgage and lending business is also under pressure it seems. The same time Netflix reported record consumer growth in Q2/2017 while also suggesting strong negative free cash flows for the suggestible future. The market was happy at that and sent the stock surging. Its just part of doing business, right? That just sounds too bad for U.S banks which are considered fundamental pillars of the economy. Fundamental enough that taxpayers bailed them out few years back but still not allowed to innovate.

Bitter truth is that innovation in core parts of the economy can make the economy unstable. Innovation implies risk and risk can materialize once in a while. So we allow innovation to happen on the fringes of the economy. Fringe enough that if we loose it, not many people would notice.

Back to FinTech or Innovation in financial services. In the U.S/EU FinTech is helping evolve micro-lending and online banking industries. The pace is much more rapid in the developing parts of the world though, like Africa. However, we may be surprised to hear that U.S is a laggard when it comes to financial innovation. The reason is not that innovators in the U.S are not smart enough - but they simply focus on other important problems like talking robots, space travel and auto driving cars.

So what is going on with FinTech that should worry us? Here is a news flash - China is fast becoming the new FinTech capital of the world, taking that title away from the western world. The transactions volumes processed by Baidu, Tencent and Alibaba are growing so fast that they are set to pass the volumes Visa and Mastercard process in Europe and US in 2017. China's mobile payment system is does not use any high-technology though. It uses bar codes to process payments! Thats way inferior and insecure than the system used by Mastercard and Visa (EMVCO) or by Google/Applepay. One thing it gets right though - its fast, convenient and cheap and thats what matters. It has connected about a billion people in China who never had any bank accounts or credit history before. So why should we be worried? Chinese companies are slowly importing this system into U.S and EU through partnerships and deals with processors like Stripe and companies like AirBnB etc. They are so big and powerful that U.S companies have little choice accommodating them. That would slowly cut into the western financial system and would cause a huge disruption in the economies in the next ten years.

What can we learn from the success of the Chinese system? First, it processes transactions without communicating with the banks each time. If the amount of the transaction has a limit and the replication is fast enough, this should work. The result is that this makes it very fast for high volume processing. It is cheap and convenient because there is no need to have expensive NFC hardware or scanning devices or chips/cards. Compare that with the our system - there are two banks involved. We use a dedicated transmission network for the payments and there are middlemen involved. More middlemen mean more fees and commissions including the need to keep reconciling data as it passes all the intermediate systems. Chip cards make things even slower. Evolution in our system is not going to be easy since the credit card companies have a tight grip on end users with rewards and points systems.

What can save us from certain failure is the need for innovation - the U.S still has around 50 million people without bank accounts and thus no credit cards. PayNearMe tapped into this market by offering an easy way to make cash based payments where credit cards were not an option. So who is going to get to it first!

How social networking combined with NLP analytics is helping expand the economy

Recently I have been researching the rise of NLP again. This was the topic of my bachelor thesis in 1995, almost 20 years ago and it has now become a hot area of research again in the last 5 years. The science and tools have evolved and a lot of new open source tools like NLTK are available for researchers.

Clearly the early users of social networking data were doing a lot of sentiment analysis on it to determine trends for companies, products, politics etc. Things have changed now - governments are interested in scouring billions of bytes of data generated daily in social networks for intelligence hints, Fin-Tech upstarts are starting to successfully use the same data to disrupt financial services - for example, the lending industry.

The sale of Troo.ly to Airbnb spiked my interest into this. Research revealed the existence of a whole bunch of companies and existing patents in the science of using social network data to determine a person's trust or risk score. Appears that lot of tricks have evolved in the last five years. However, this is quite scary as well! In the next few years, I can see a lot of people trying to use this score instead of just pulling the credit scores in business transactions. This could be landlords, people selling goods on craigslist (e.g cars etc), small business engaging in seasonal hiring to keep costs low. It could certainly replace simple background checks. While this is all great, the scary part is that this presents a Kafkian situation for people at the other end of it. Unlike their FICO scores, there is a lot less visibility into how the machine learning algorithms work. Imagine a customer service rep trying to explain why your social risk was considered too high when renting an apartment for a few days on AirBnB.

This led me to findings how the big Chinese payment companies (Ant Financial, Baidu) are now planning to use all the data that they collect from repeated payment transactions and the social networks they own to determine credit scores of people. This idea is definitely not new but it is breaking new ground in the last couple of years. For example, companies like HelloSoda, Guardian Analytics have been doing risk scoring for quite some time now. There are also a lot of banking and lending upstarts in U.S and EU - Kabbage, Simple, Moven, Fidor etc. However, FinTech has taken social networking data to a new level now and this time its not just to send marketing and sales offers (like the famous American Express examples show) or to make your banking very cool indeed.

There are multiple innovation areas - companies like Jumo have made a significant penetration in the underbanked markets of Africa to enable micro-lending based on social networking data that they collect. In China, Baidu already has more than 10% of its assets involved in some kind of lending, the biggest being in the education market. Now Alibaba and Tencent are also getting involved. Not only is the mobile payments market in China set to pass the transactions Visa and Mastercard process but it is also spinning out new uses of the data collected. In one way it looks like social networking companies may have an edge over managing risk and may have slowly started to disintermediate traditional lending companies. There are 800 million people using Tencent for payments that have no credit history with the central bank and the daily transaction patterns reflect a lot about their behavior and risk. Take a look at AutoGravity - users can not only select cars, schedule a test drive but once they connect it to their social networking account, the site also prefills the application, verifies identity and lines up four lending companies without having the user fill up scores of forms or go to an office. All on the mobile phone in minutes. Facebook, Google and Apple could be doing this next year. It all starts by focusing on an underserved market and there are enough in financial services - there are almost 64M Americans without sufficient credit history and almost 2B people around the world without a bank account.