Flexibility and responsiveness are vital, recognizing this is a fast-moving space in which it’s not always possible to figure out every detail right now.
3. Be transparent about the data
Whether you’re consuming external foundation models, or customizing them for your own business purposes, it’s essential to recognize the risks inherent in the data used to train, fine-tune and even use these models, while remaining transparent. These risks vary depending on architecture choices.
Data is at the core of large language models (LLMs), and using models that were partially trained on bad data can destroy your results and reputation.
The outputs of generative AI systems are only as unbiased and valuable as the data they were trained on. Inadvertent plagiarism, copyright infringement, bias and deliberate manipulation are several obvious examples of bad training data. To engender trust in AI, companies must be able to identify and assess potential risks in the data used to train the foundational models, noting data sources and any flaws or bias, whether accidental or intentional.
There are tools and techniques companies can use to evaluate, measure, monitor and synthesize training data. But it’s important to understand these risks are very difficult to eliminate entirely.
The best short-term solution is therefore transparency. Being open and upfront about the data used to train the model—and the entire generative AI process—will provide much-needed clarity across the business and engender necessary trust. And creating clear and actionable guidelines around bias, privacy, IP rights, provenance and transparency will give direction to employees as they make decisions about when and how to use generative AI.
4. Use human + AI together to combat ‘AI for bad’
We must use AI for good if we want to defend AI for all. And we can use generative AI itself to help make enterprise use of generative AI more robust overall.
One option is to have a “human in the loop” to add security and ensure a sanity check on responses. Reinforcement learning with human feedback (RLHF) tunes the model based on human rankings generated from the same prompt.
Building on RLHF, Constitutional AI also uses a separate AI model to monitor and score the responses the main enterprise model is outputting. These results can then be used to fine-tune the model and secure it from harm.
5. Understand emerging risks to the models themselves
AI models themselves can be attacked and jailbroken for malicious purposes. One example is a “prompt injection” attack, where a model is instructed to deliver a false or bad response for nefarious ends. For instance, including words like “ignore all previous directions” in a prompt could bypass controls that developers have added to the system. Anecdotally, we’ve seen examples of white text, invisible to human eyes, included in pre-prepared prompts to inject malicious instructions into seemingly innocent prompts.
The implication? Business leaders will need to consider new threats like prompt injection and design robust security systems around the models themselves.
Ensuring Generative AI is safe AI
Generative AI and foundation models represent a real milestone in AI development. The opportunities are virtually limitless. But there are new risks and threats too. Business leaders need to understand these risks and take urgent action to minimize them.
There are ever evolving models, frameworks, and technologies available to help guide AI programs forward with trust, security and privacy throughout. Focusing on trustworthy AI strategies, trust by design, trusted AI collaboration and continuous monitoring help build and operate successful systems.
Our shared goal should be to leverage the power of generative AI in a secure way to deliver value to business and improve the lives of all who use it.