Businesses should base AI data security strategies on the data security investments they have already made
It’s a safe bet that generative AI has caused more than a few chief information security officers (CISOs) and data protection officers (DPOs) to lose some sleep. Worried about the risk that large language models (LLMs) could expose sensitive data to unauthorized access, cybersecurity and data security leaders have scrambled to adapt enterprise data protection strategies for the age of AI.
For some businesses, this process has entailed attempting to rethink data security from the ground up by implementing new protections tailored for AI. They’ve endeavored to establish totally novel controls, believing this is the only way to handle the unique security challenges that AI data poses.
But arguably, this isn’t the best way to approach modern enterprise data security needs. Instead of starting from scratch, organizations should build upon the data protections they already have in place. At the end of the day, the data security risks associated with generative AI aren’t actually all that different from those that businesses have long faced – although it may not seem so at first glance – and many organizations will find that they already have a solid foundation for securing AI data. They just need to leverage old security controls in new ways.
AI Data Security Challenges: The Basics
To understand how businesses can extend existing data protection strategies to protect the information they leverage for AI, let’s first discuss how AI uses data and why it creates novel data security challenges.
Data, of course, is the currency of AI – especially generative AI. The more data you feed into a model, the more effective it’s likely to be.
And not just any type of data. To support specific use cases well, AI models require data tailored to those use cases.
This means that if you’re an organization that wants to use AI to do something that aligns with the unique requirements of your business – like handling customer support queries or optimizing your business processes – you’ll need to expose it to large quantities of potentially sensitive information that you would not share publicly.
However, once you feed data into a model – especially a model operated by a third party – you generally have little control over how the model will use the data. There’s a risk that a model trained on private data owned by one business could potentially use that data to create content for other businesses. Even in the case of internally developed models, data owned by one user or department could potentially leak to other groups if they all use the same model.
This is the key challenge that has kept CISOs and DPOs up at night over the past couple of years, as they have been tasked with helping their companies take advantage of AI without letting AI models become gaping holes in enterprise data security strategies.
We’ve Seen This Movie Before
In some respects, this may seem like an entirely novel challenge. Until ChatGPT appeared on the scene a couple of years ago, almost no businesses were even thinking about the threat that AI models posed to data security.
But in other ways, having to secure data on a massive scale is not all that new. It has been a common concern for enterprises at least since the dawn of the Big Data boom in the 2000s. The data wasn’t used in most cases to power AI workloads, but it was large-scale, sensitive data all the same.
As a result, many organizations have already built controls to protect sensitive data at scale. They use role-based access control (RBAC) systems to manage access to unstructured data, such as documents and media files. They configure access settings for databases to control which types of queries users can run. And they establish data governance policies to manage where and how they allow their data to be used; for example, different rules may come into play when companies share data with a third-party SaaS app compared to using data within an app they develop in-house.
Viewed from this perspective, we’re not exactly shooting in the dark when it comes to securing data for AI. To be sure, there are some novel challenges to contend with – above all, the fact that it’s hard to control precisely what a model will do with data once it ingests it – but most enterprises already have a solid foundation in place for securing and governing the data that they could potentially feed into AI models.
Extending Data Security Strategies to Support AI
That’s why businesses that want to work smarter, not harder, should base AI data security strategies on the data security investments they have already made, rather than trying to rebuild everything from the ground up.
Here are some actionable steps to that end.
1. Tidy Up Unstructured Data
When working with unstructured data that you might feed to an AI model, be sure that you have effective access controls in place for the data. Although most businesses have data governance policies that require unstructured data to be secured in theory, in practice it’s easy to end up with sensitive data that is misclassified or stored in the wrong places, and that consequently is not properly configured to restrict unauthorized access.
To fix this issue, inventory and classify your unstructured data. AI can help here by automatically parsing files and assessing whether proper RBAC controls are in place for each one.
Then, when you feed the data into an LLM, use tokenization to map the data onto the external RBAC policies you have in place. This allows you to restrict which users can access data via the LLM – effectively extending traditional access controls into your AI models.
2. Use AI to Query Databases Without Ingesting Data
When you’re dealing with structured data, such as information stored in a database, a great way to mitigate data security concerns is to use AI to write database queries, rather than letting it train on the information in the database itself. That way, you can block any unauthorized queries through the database’s access controls, instead of relying on your AI model to secure data access. This is another simple means of extending data controls you already have in place into AI use cases.
3. Align Data Governance Rules with AI Models
Just as you likely have data governance rules in place to manage how traditional apps can access your business’s data, you can establish similar policies to manage how AI models interact with data. For instance, you might decide that some types of data can only be exposed to models you operate in-house, whereas others can be shared with third-party AI services, like Azure OpenAI Service.
4. The Future of Data Security in the Age of AI
Practices like these won’t solve every AI data security challenge. But they are effective steps that enterprises can take today to tighten controls over the data they expose to AI models. Just as importantly, they’re feasible measures that don’t require businesses to reinvent the wheel. They simply involve leveraging existing data security investments in new ways.
Published: AI Business