Indian Railways (IRCTC) made headlines earlier this week when news broke that it had invited companies to bid for recommendations on how to monetize passenger data. While the railways have since twice denied the reports, calling them “fictitious”, the tender document can be viewed on the company’s website. The disclaimer and the document contradict each other.

Conversations with people at the Ministry of Electronics and Information Technology (MeITY) suggest there’s a lot going on in the background. It seems the intention is to position India as the global capital of artificial intelligence (AI) model making. And they don’t rule out the possibility that the IRCTC, although it has now blocked the idea, wants to be considered among those with a “first-mover advantage” alongside big pharma, e-commerce and fintech companies pushing the boundaries. of technology.

The background is that a Personal Data Protection Bill (PDP) for India had been in the works for nearly a decade. It was first introduced in the Lok Sabha in December 2019. But it was abruptly withdrawn earlier this month and MeITY released a statement that an amended bill will soon be introduced. This bill, in its original guise, recognizes privacy as a fundamental right. Speaking off the record, those working on the bill say that right will be upheld. And that a new version of the amended bill will be tabled in Parliament by December this year.

Before we get to the amendments, what got people upset about the IRCTC proposals?

The first is that the scope of work defined in the document clearly includes the review of personal data. This data includes people’s names, age, cell numbers, gender, address, email IDs, logins and passwords, among others. This is a violation of privacy. Ironically, the document also states that the selected company “will consider various laws or statutes, including the Computer Act 2000 and its amendments, user data privacy laws, including GDPR (General Data Protection Regulation ) and the current “India Personal Data Protection Bill 2018, and come up with the digital asset monetization business models accordingly.

But those who work argue that there is nothing “out of the ordinary” here. As examples, they point out that all over the world, companies that process data operate in a gray area. Fintech companies are deploying AI to examine personal data so they can decide whether to lend or not. Hospitals use medical scanners to predict outcomes of diseases such as cancer. E-commerce websites and browsers store data to serve personalized advertisements. And how can you ignore that the worst offenders include browsers like Google’s Chrome and Alexa, Amazon’s personal assistant? All of these entities claim that this means better results for all stakeholders. But does he?

Take medical data, for example. Someone may have given consent to offer their personal data for diagnostic purposes. These are primary data. But they may not have given their consent to the use of their data for cancer research (secondary data) – even if it is anonymised. A pharmaceutical company may argue that consent to data collection implies that it can be used as primary and secondary data. Not only that, the study of this data is how medical knowledge develops over time. It’s the same argument that fintech companies, e-commerce sites and internet browsers use.

The counter-argument here is that secondary data is monetized. The silos where this data resides earn the entities that hold it large sums of money. And over time, as their silos grow, they learn to create better models with better results. And earn more.

This raises a tricky question: what is it for someone whose primary data is deployed as secondary data? To circumvent this problem, a new experiment tentatively called “Differential Privacy Model” is being tested across India. It could not be independently confirmed if IRCTC is one of the test cases. But what could be confirmed is that work is underway to create “Bio Banks”. These are places where large samples of medical data, tissue samples, and genetic data are stored. India is a great place to do this due to its diverse population.

A use case? Pharmaceutical companies can use their models in these libraries to test the effectiveness of drugs under development on various samples.

As it stands, to build a model that understands the difference between a cat and a dog, for example, the AI ​​has to look at images that reside in a database. In the proposed scheme, the model is first trained to understand the differences by feeding it with labeled images of cats and dogs.

For the model to improve, it can inspect the databases where such images exist. It is “computer guaranteed” by an agency that no personally identifiable information has been used in a “certified clean room” and that the model works. This model can also be extrapolated to other ecosystems such as fintech, medicine and e-commerce and the potential is starting to show.

An additional layer being worked on is that any entity wishing to access these databases must pay a fee. Eventually, these fees will be distributed as royalties to the people whose secondary data resides in the databases.

So, hypothetically, if the IRCTC database were to be used by an entity to refine its model, then the monies paid to the IRCTC would have to be equally distributed among the people whose data it touched. One way to do this is to offer discount coupons, for example, to book a train journey. Or vouchers to buy meals on long-distance trains.

While this sounds extremely ingenious and ambitious, it also raises multiple questions that Indian policy makers need to grapple with. For starters, if a Biobank is financially lucrative, how do you protect financially vulnerable people who would be willing to give up their tissues and privacy for pennies? We will wait until the end of the year for more details.