Krzysztof Witczak

Challenges in shipping Gen AI-based systems

May 12, 2024

Roughly a year and a half ago OpenAI released ChatGPT to the public and the world went crazy! A lot of that was early hype and excitement, discussion around possibilities, but then Klarna let go 700 people claiming their gen AI bot replaced that area, potentially leading to $40 million in profit for the company. Yikes!

It’s not a surprise that many companies look for their chance in gen AI opportunities, or are pressed by investors who hope for great ROI from such feats. Let’s take a look at gen AI-specific difficulties in making it happen.

Hallucination

The most commonly mentioned issue or fear is that gen AI hallucinates, which means it returns responses that seem real at first glance, but they are not. It’s kinda funny if you present it in a way that gen AI can “lie” to its users 😄 It can even lie about quotes, citations, sources it took knowledge from… In the real world though that causes limitations - if your system has to be precise and accurate, false information may harm you:

  • Double-checking. Law, medical, and financial advisors won’t trust gen AI as long as they have to waste time verifying all of the details themselves.
  • PR damage. If you play Baldur’s Gate 3, on Reddit there are a bunch of annoyed players mentioning game tutorials and walkthroughs being AI generated - with false NPC names, and false quests… it caused the website to be unreliable for players, so they look for their answers elsewhere.

    • We even have a new IT buzzword for unwanted AI-generated content - Slop. Make sure you’re not contributing to it! 😄
  • New type of safety attacks. For engineers that opens a can of worms like new threats like package hallucination attacks or misclassification attacks

Prevention

  • Implement RAG system and make sure your data sources are of good quality - they don’t contradict each other, they are groomed, up to date.
  • Reduce ambiguity - for example, focus on simple prompts and limited-choice answers, instead of open questions.
  • Avoid math (even basic) 😅 - instead, extract data and do math programmatically later.
  • Apply prompt techniques like chain of thought, self-consistency and others to increase accuracy.
  • Sandwich defence - wrap user prompt in pre-prompt (specify role, instructions) and post-prompt (put self-reminders). More.
  • Avoid null. Specify to result back “don’t know” if that’s the case but ideally prevent it even from happening as an option, since LLMs have difficulties in handling emptiness.
  • Use DevSecOps tooling to catch the package hallucination (Snyk or others) ideally to run locally on dev machines before they damage themselves…
  • Lower down temperature of the model, to balance its creativity with accuracy.

Jailbreak - the “screenshot attack”

This is essentially convincing the AI bot to do a task which it was not supposed to do. That may be for pure trolling reasons or it may be an attack to gain a specific benefit. Lately, it’s usually a discount:

  • User convinced Air Canada to sell the tickets for a lower price… and later airline was held responsible to pay the user $812.
  • User tried to convince car dealership chatbot to sell him a $70k car for $1. Then the user posted a screenshot - those “trophies” of proving AI (or company) silly (or user smart!) usually bring bad reputation to the company, and this is why it’s sometimes being called “the screenshot attack”. In the end, the guy didn’t get the car, but his goal was mostly to have fun (phew).

However, you have to be aware that sophisticated jailbreak methods designed for specific models are already in place, publicly accessible.

Prevention

  • Use Moderation API from OpenAI or Content Safety API from Azure or similar tools.
  • Use Guardrails AI, notice how many specific validators it has. Seems to be an amazing tool that will gain popularity in the coming years and become a standard, unless gen AI providers make their APIs even better.

Prompt Injection

You can do jailbreak by just “convincing” the AI system to do something (I like to call it a successful charisma roll 🎲). For prompt injection you have to figure out how the prompt uses user input and how (and what) you can do with it. This is the same use case as with typical SQL injection and usually ends up with more serious consequences:

  • User can obtain a prompt which was used on the bot as initial prompt which can lead to stronger attacks.
  • User can list documents used/injected through RAG which may be confidential.
  • Same as with jailbreak, the user can ask the bot to ignore everything which was prompted before and do something embarrassing.

Prevention

  • Raise awareness. Your engineers likely don’t know about new problems since the area is pretty fresh.

    • Use tools like Agile Threat Modelling to guide the team through new vector attacks related to gen AI.
    • Share https://tensortrust.ai/ with them - it’s a browser game designed to hack gen AI - pretty cool and you’re learning by doing in a safe environment!
  • You can filter input or output (block/allow lists) through the program or even through separate LLM evaluation (Moderation API mentioned above is is sort of the same thing, also this is how Gemini grooms user prompts).
  • You can instruct the model before or after the prompt to be aware of the possibility of the user acting maliciously.
  • You can make sure to tag user input in a way to increases the chances it won’t be easily escaped by the user (think SQL injection through the closing string - however with AI we can use any tag we want, even random numbers as entering and closing tag).
  • Use Guardrails AI mentioned above.
  • However, you should know that to this date, any of these methods does not guarantee 100% safety.

Regressions

This may happen due to the model being updated by its creators, or prompt being slightly modified, or maybe RAG resources were modified, or anything in between. In general, it’s the same as with typical software regressions, the difference is that we have fewer ways of dealing with it effectively.

Prevention

  • Try promptfoo which works as automated test suite for prompts.
  • Consider using LangChain Evaluators.
  • Feature flags to control the spread of an issue.
  • Manual testing is a typical last resort for most complex situations.

Less deterministic/reliable tech requires a different design

By its nature, gen AI responses, depending on the task, will vary, even for the same prompt. That’s the beauty of it, causing gen AI to mimic human behaviour, but may cause issues like:

  • wrong format of response in less than 1% of situations
  • ignoring explicit instructions
  • hallucinations mentioned before The worst thing is that it happens sometimes, and it’s difficult or impossible to reproduce. That makes interaction with our system less reliable and stands out as a low quality.

Prevention

  • Reduce ambiguity from the system through all of the tricks listed above!
  • Make for the system to easily raise from a failure, even an unknown one, and repeat (limited number of times, think circuit breaker), fall-back to a backup model or reply in standard fashion.
  • Implement human in the loop (HIL) system instead of fully relying on AI.
  • You can try the “complementary agents” technique and build AI agent advisory board to help them figure out the right answer from the non-typical situation.

Incoming compliance requirements

At the time of writing this post, EU AI Act is about to start ticking its 6 months deadline till the first regulations come live. It has its website - https://artificialintelligenceact.eu/the-act/ where the act is translated into multiple languages. In my eyes this document will be a similar thing in consequence to IT as GDPR was - we will have to ensure compliance in many areas. This is the first regulation of such scale but there are more to come, from the USA, China, UK - more details can be found in “The State of AI Regulations in 2024”.

Preparation

  • Use the EU tool to run self-evaluation using EU AI Act Compliance Checker.
  • Evaluate the risk of your AI system to know if it would be considered a high or unacceptable risk system by the EU AI Act.

    • If it’s unacceptable, you’re screwed, and call a lawyer to figure this out, you have 6 months!
    • If it’s high, you’ll have to build the entire AI governance practice in your company (just as you did with data governance for GDPR) - and you’ll have 24 months to do so. Governance will require a better threat analysis, quality control, and safeguards to be in place, and probably you’ll have to use many of the tools listed above to show proof that you’re holding hand on a pulse.
    • It’s important to notice that high risk is not only those “typical” high-risk systems in IT like health, safety, and security. It’s also HR systems, and tools to support recruitment, and hiring. The idea is that any tool which may cause AI to affect your career or access to education is considered a high-risk system.
  • If the risk is medium, you’ll have to at least annotate anything which is AI-generated with specific, explicit text or tags. That means Gen AI-generated text, images, and videos but also interactions (bots on the website, phone callers) - the idea is that users have to understand they are interacting with AI (so I guess Gen AI passed the Turing test after all 😉).
  • Additionally you’ll have to assure traceability of AI decisions for up to 6 months, which may turn out to be complicated when matched with GDPR and the black-box nature of AI models. The idea is that users may come back later to you and ask why AI made a certain decision which impacted them.
  • Consider running yourself against extra checklists like AI Guardian Checklist or OWASP LLM AI Governance Checklist.

As with any compliance, the EU AI Act has a couple of issues already identified but unclarified:

  • Act defines multiple roles, but it’s not always clear to define who you are in those systems - the difference between provider and deployer of AI system is very blurry but responsibilities are much higher on providers.
  • Even risk classification is blurry - survey by appliedAI Initiative GmbH on 106 AI systems showed up that for nearly 40% it was unclear to which category of risk they should belong. Since high risk category comes with huge governance overhead for organizations, it’s important distinction.
  • My wife pointed out to me - at which level of modification or interaction with a given work, it will be considered to be AI-created? I guess that if I create an article from scratch and ask Gen-ai to groom it, it’s not, but if Gen-ai creates an article from scratch, it is. However, we can expect this to be more and more tangled with higher adoption and usage of AI tools, more human-in-the-loop systems and so on.

Because of GDPR, we can already see that certain AI tooling opens later in the EU than in other countries (like Gemini Pro 1.5). I guess that this compliance will make the EU a safer, but even more difficult area to release products in and it will strongly affect big companies which we know for creating large models. We can expect more powerful models in 2025 to be released in the EU with a delay compared to the US.

Summary

Gen AI can allow businesses to make a leap, but it’s not for free and you can land in something which looks like mud but smells worse.

We have to learn how to gain value from those systems, and turn it into revenue, but watch out for potential damage we can cause by the bleeding edge technical QA & safety difficulties and compliance. From everything I mentioned above, I think the https://www.guardrailsai.com/ may be the most promising tool to try to early adopt.

Ignoring the threats listed above may cause you to lose users or market, but be aware that you can also lose it even easier - with gen AI abuse and wave of slop they don’t want to see.

All things in moderation, including moderation.”

😉