Generative AI – with training wheels?

Go, uncertain staff member, try out all the wonders that await you. We'll be here to stop you wiping out...andtaking our data with you.
18 July 2023

When you’re trying something new and dangerous, you need help so you don’t fall over and cry.

• Generative AI has a data-hunger problem for companies.
• A new service can prevent staff leaking data to the generative AI.
• There are ethical considerations, so buy-in from HR is essential

Generative AI is one of those ideas that’s “too big to fail,” and too useful to get rid of, even if we could.

Since it arrived, initially in the form of ChatGPT from the Microsoft-backed OpenAI and quickly followed by Google Bard and others, it’s been adopted into both business and civilian life with – perhaps – more and blinder trust than it warrants, because when it works, it works brilliantly and transformatively.

But issues with the tech have been ongoing – not the least of which is its capacity to take the data you feed it into itself, and use it outside of your sphere. That means, as Samsung found out in April 2023, that generative AI can take a company’s proprietary data and make it legally available to the world, providing it’s been given the data by someone with legitimate access to that data – say a company employee.

That’s made some companies wary of using generative AI in ways that might help them get ahead in their sector, for fear of losing sensitive, proprietary or even personal data.

Samsung itself initially stopped its staff using ChatGPT entirely in the wake of its data accident, and many other companies have followed suit – despite knowing that they risk falling way behind the curve of their industry by not making use of the bright, shiny new technology that all the cool kids (and, more to the point, all the rich kids) are using.

In Part 1 of this article, we spoke to Rich Davis, Head of Solutions Marketing at Netskope – a company that claims to have a world first product that makes generative AI data-safe for company use. He explained the dichotomy many companies found themselves in – to use generative AI or to be absolutely certain of their data-safety.

Then he told us that Netskope had been working for a decade on protecting data in transit from users to SaaS apps, and that it had developed a parser that could speak the language used between clients and generative AI.

We asked him how companies were using this apparently world-first tool that, in theory at least, could make generative AI safe in even the most… Samsung… of situations.

Getting visibility over generative AI usage.

RD:

So far, about 25% of our user base has used the tool to just gain visibility – a case of “just tell me on a daily basis what we are seeing and what tools are in use, what are people doing with the generative AI.” Are staff mainly posting data for queries? Is that data long form or short form? Or are they mainly playing with it? Are they using non-company data? Or are people just trying to, say, summarize last season’s NFL action in a sentence?

A lot of companies are literally just trying to get a handle on how the generative AI is being used, because this is all still pretty new.

THQ:

That’s the thing, isn’t it? It has so many potential uses, if you just let staff go nuts with it, they may not yet appreciate what they could be using it for. Or, alternatively, you might have tech-savvy staff that are all about the “How can I make this more efficient?” queries.

RD:

Yeah, we see a lot of the NFL-style queries, but we’re seeing a lot of sensitive data in the queries too. Around 25% of our client base is using it, and we have between 2,500 and 3,00 large organizations. So a fair number are already using our tool for visibility.

The next step is application access – what do we allow, what don’t we allow, starting to simply control it on a potential basis. For instance, we’re going to allow something like Jasper, because if you have a marketing arm, you could be using that to look at whether you could improve your copywriting. But say as an organization, we don’t want to enable some of the other apps out there. Our tool lets you allow or disallow access to data as is appropriate.

And then there’s a third tier.

THQ:

You understand when you say that, it sounds like it comes with three dramatic chords, right?

Generative AI training wheels – the third tier.

RD:

That third tier, and I would say about half of that 25% are already doing this, is actually using our ability to deeply look into the transaction, understand exactly what the user is doing. We can potentially look at the data they’re using and make a decision to block that request being sent if it contains sensitive data.

THQ:

So your tool is a kind of data cop?

RD:

Sort of. But it’s also sort of like a firewall. People can still use the generative AI, they can type in whatever they like – but as soon as there’s sensitive data going in, it’s blocked.

THQ:

And then what? Flashing alarms? SWAT teams dropping down ropes from the ceiling? Weeping staff taken away to a data-gulag?

RD:

Ha. They get a coaching alert that says these are the boundaries, these are the terms of how we can use this tool today.

THQ:

So that last tier trains people by allowing them to bump into safe walls, but without setting the place on fire or accidentally donating proprietary data to the Russians?

RD:

Exactly. Actually, a lot of organizations have initial warnings when you start using generative AI that says “Don’t post sensitive information, please have a read of the safe usage policy” – and then you hit Enter to continue, so it’s a kind of initial re-setting of the mindset, an initial training in the do’s and don’ts.

Then you’ve got that hard safeguard at the end of the line, which kicks in if staff go on to do something stupid, and post the minutes of the latest board meeting into the generative AI – which you can stop from actually being sent up to the AI.

Generative AI needs to be made safe.

Pathways to safety will begin to appear in increasing numbers.

It’s a soft approach to training, and a hard-remove tool because actually, at the end of the day, we can’t have this information going out. It’s really no different to what we’ve been doing for years across any application. It’s just a different case here with the risk of where this data could potentially go, and how it could potentially be reused. That danger’s not necessarily there if you’re just storing files in a personal OneDrive, for instance.

Awareness of generative AI data leak potential.

THQ:

It’s a question, isn’t it – how aware are staff generally that the Samsung sort of data leak could happen? Unless they’re fairly tech-savvy, they’ll usually have a mainstream media view of it, but they won’t necessarily know what privileged data is, and therefore that they shouldn’t be using it in generative AI.

RD:

No, exactly. They could well just be thinking “This could make my life easier.”

That’s why training is so important – because you’re not putting that brick wall up. As soon as anyone sees a restriction they go, “okay, how am I going to get around this, I want to play with it. I’ll try a different generative AI.”

So actually saying, “Yes, you can use this, here are the risks” actually helps a great deal on user acceptance. And this is something that even outside of generative AI, a lot of our customers have been using for quite some time.

THQ:

Training wheels, rather than baby gates.

If prevented from using one generative AI, staff may well just use another.

Tell people they can’t do a thing without explaining why, and they’ll just find another way.

RD:

Exactly. And the other positive to that approach is that as an organization, you get an understanding of the usage, because you’re putting it in their hands, and you’re seeing what the usage is – whereas if you just stop using it in case your data goes flying out the door, you’ve got no idea what problems people were trying to solve with it.

We’re seeing a growth in usage of about 25% month-on-month at the moment, as you’d expect in the corporate environment, and about one in ten users in an organization are using one of these generative AI tools on a regular basis, actually on their work within their work environment.

THQ:

Is there not something creepy in the idea of monitoring the usage of a particular program across companies?

RD:

Yes, there is actually.

THQ:

O…kay. It’s fair to say we didn’t expect you to agree on that.

RD:

This has always been a big debate, and I’ve talked to CISOs for years about this whole concept. I think it comes down to needing to protect the business. It’s that trade-off between monitoring users and protecting the organization. And that’s why when you’re doing this, you’ve obviously got to put safeguards in place, you’ve got to anonymize user data so you can’t pinpoint a particular individual user, you’ve got to really limit the control as to who can see these types of reports.

One thing I would always say to any organization is that you’ve got to do something like this in conjunction with HR. You’ve got to bring your HR team into this whole approach from the beginning if you’re going to do this type of analysis and control.

Monitoring staff use of generative AI could have ethical considerations.

Monitoring and intervening with your staff’s use of generative AI? Mmm, you’re going to need some organizational buy-in on that.

Where I’ve seen it really work well is in organizations where they’ve got that organizational buy-in from the beginning. Where they’ve gone through, explained it to the users, “This is why we’re putting this technology in place. These are the risks of the business. And this is why we’re doing it.” That’s where it alleviates most of the creep-factor, because people get it.

 

In Part 3 of this article, we’ll look at the layers of protection that make up the first effective method of keeping your data safe while using generative AI within a company.

What happened with Samsung?