AI terminology you should probably know

Welcome to the first post in a brand new series, "AI for Cities." Given the buzz around artificial intelligence, many cities are investigating its potential to improve services, control personnel costs, and make their cities smarter. But, as with any new* technology, the content landscape is barbell shaped. On one end is the super-technical writing meant for people already familiar with the subject matter, and on the other is a bunch of grifters just trying to make a quick buck by hyping the technology.

This first post is simply meant to help you understand some of the common terms and phrases. Feel free to bookmark this page and use it as a reference in the future. We'll keep it continually updated, and even include links to articles (some from us, some from others) that can help you dig a little deeper.

With all that out of the way, let's get started!

Core Concepts

Artificial Intelligence

Artificial intelligence, or AI, refers to the ability of machines to perform tasks that typically require human intelligence, like understanding language, recognizing patterns, making decisions, or generating content.

The concept dates back to the 1950s, when researchers first imagined computers that could mimic aspects of human thinking. Early AI focused on solving logic problems and playing games like chess, but today’s AI is powered by massive amounts of data and advanced computing, allowing it to learn and adapt without being explicitly programmed for every task.

Machine Learning

Machine learning is a subset of artificial intelligence that allows computers to learn from data and improve over time without being explicitly programmed for every step.

Traditional software follows along a pre-defined set of rules and logic. When something happens that hasn't already been accounted for, the software will "throw" an error which halts the program in its tracks.

Machine learning allows you to create programs that can function based on finding patterns in large quantities of data. Then, when new data arrives that hasn't been seen yet, it can make some educated guesses about what to do next instead of crashing.

Like AI, machine learning has been around for a while. But the introduction of increasingly powerful computers has made its presence in our lives more commonplace.

Cities might encounter machine learning in tools that predict which streets need repair, analyze permit applications, or detect unusual utility usage. It is the engine behind many of the smart technologies transforming local government operations today.

Generative AI

Generative AI is a type of artificial intelligence that can create new content based on patterns it has learned from existing data. This includes generating text, images, audio, video, and even computer code.

Imagine an AI model trained on traffic violation data. It might be able to take information about a new traffic violation and find similar violations to look for patterns. A generative AI version of this model might be able to create a realistic facsimile of a repeat offender that could be used for training or enforcement planning.

Generative AI is often discussed in terms of language models, image generation, and other similar tasks; but its use-cases are certainly not confined to those tasks.

Large Language Models (LLMs) and Chatbots

Large Language Models, or LLMs, are a specific type of generative AI designed to understand and produce human language. They are trained on enormous amounts of text from books, websites, and other sources to learn how language works at scale.

LLMs can answer questions, summarize documents, draft emails, translate languages, and much more by predicting the next word in a sentence based on the context provided.

Like all AI models, LLMs receive some new input, find patterns that relate to data it was trained on, and make predictions about how to respond. What makes LLMs powerful is their ability to understand intent and respond in clear, natural language.

Neural Networks and Deep Learning

Neural networks are the underlying technology behind many modern AI systems, including large language models and image recognition tools. Inspired by the structure of the human brain, neural networks are made up of layers of interconnected nodes, or "neurons," that process and transmit information.

When data moves through the network, each layer transforms it slightly, helping the system detect patterns and make sense of complex inputs like language, images, or numbers. The more layers a network has, the more sophisticated its understanding can become (this is known as deep learning).

Neural networks allow AI to learn from examples rather than following fixed rules, making them ideal for tasks like speech recognition, document classification, or detecting patterns in large city datasets.

Language, Input, and Interaction

Prompting

A prompt is the input or instruction you give to a generative AI system to guide its response. In tools like ChatGPT for Google Gemini, a prompt can be a question, a command, or even a few words that set the context for what you want the AI to do. The quality and clarity of the prompt often determine how useful the output will be. Learning how to write good prompts is an essential skill when working with AI tools.

Prompt Engineering

Prompt engineering is the practice of crafting effective and precise prompts to get the most accurate and useful responses from an AI system.

Because these models don't "think" in the traditional sense but respond based on patterns in their training data, the way a prompt is worded greatly influences the output. Prompt engineering involves choosing the right tone, format, and level of detail to steer the AI in the right direction. You might also have to tweak your prompts depending on the Large Language Model you are using.

For example, asking “List the key points from this meeting transcript in bullet form, written for a non-technical audience” is more likely to yield a helpful result than a vague request like “Summarize this.” It’s a powerful skill that helps non-technical users communicate more effectively with AI tools.

Chain-of-Thought

Chain-of-thought prompting is a technique used in prompt engineering to help AI models work through tough problems step by step. Instead of asking for a quick answer, this approach encourages the model to explain its thinking as it works toward a solution.

For example, rather than saying “What’s the total cost of this project?” you might prompt the AI with “Let’s calculate the total project cost step by step. First, list the expense categories, then add them up.” This method helps the AI avoid mistakes and produce more logical, reliable outputs. It’s especially helpful for complex questions that benefit from breaking things down into smaller, manageable parts.

Reasoning Models

Reasoning models are AI systems specifically designed or fine-tuned to perform complex, multi-step problem-solving tasks by mimicking logical or structured thinking. Unlike general-purpose language models that rely on surface-level patterns in text, reasoning models are built to analyze information, break problems into parts, compare outcomes, and even check their own work. These models often integrate techniques like planning, symbolic logic, or mathematical inference, making them better suited for tasks that require deeper understanding, such as legal analysis, multi-step calculations, or policy tradeoff evaluations.

This contrasts with chain-of-thought (CoT) prompting, which is not a type of model but a technique used with general models. CoT prompting encourages step-by-step reasoning by explicitly asking the model to explain its thinking before giving an answer. For example, prompting with “Let’s solve this in steps” can help the AI produce more accurate or structured responses, but it still relies on the model’s existing abilities.

Reasoning models are typically more powerful, take more time to run, and (when used via an API) cost more than standard models using CoT, but they often perform better on more complex tasks.

Few-Shot Learning

Few-shot learning is a method in machine learning where an AI model can learn to perform a new task after seeing only a few examples. Instead of needing thousands of labeled data points to understand how to do something, the model is shown just a handful (sometimes as few as two or three) and uses its existing knowledge to generalize and respond correctly.

This approach is especially useful in large language models, which are trained on a broad range of topics. For example, if you show the model two examples of how your city formats a meeting summary, you can then prompt it to generate a new summary in the same style. In local government, few-shot learning is helpful for adapting AI to city-specific processes (like interpreting permit types, drafting council memos, or flagging relevant parts of a zoning request) without needing to retrain the model from scratch. It’s a fast and accessible way to customize AI behavior for real-world use.

Reinforcement Learning

Reinforcement learning is a type of machine learning where an AI system learns by interacting with its environment and receiving feedback in the form of rewards or penalties. Instead of being told exactly what to do, the system tries different actions and learns over time which choices lead to better outcomes. This approach is similar to how humans or animals learn through trial and error.

Reinforcement learning is often used in situations that involve decision-making over time, like teaching a Roomba where your furniture is, optimizing traffic signals, or training an AI to play a game. In some AI models used today, including tools that respond to natural language, reinforcement learning is used alongside human feedback to improve how the system prioritizes helpful, accurate, or ethical responses.

Tokens

Tokens are the basic units of text that AI models process when reading or generating language. A token can be as short as a single character or as long as a word or syllable, depending on the model’s design. For example, the word “downtown” might be one token, while “taxpayer-funded” could be split into several tokens.

Tokens matter because language models don't read entire documents the way humans do. Instead, they break down everything into tokens and use those to understand and predict text. Every prompt you give and every response the AI generates counts toward a token limit.

There are many elements related to tokens that can impact your evaluation of an LLM, include the size of its token dictionary (which influences how accurate its predictions can be), how quickly a model can produce tokens, and how much each token (both input and output) cost.

Context Windows

Context windows refer to the amount of information an AI model can "remember" and use at one time while generating a response. Think of it as the model's short-term memory. It includes both the prompt you give and any previous conversation or content that the AI is drawing from.

The size of a context window is measured in tokens, and different models have different limits. For example, older models might only handle about 4,000 tokens (roughly 3,000 words), while newer models claim that they can handle up to 1m tokens or more (enough for an entire novel or hundreds of pages of city documents). Context window size can also impact long-running chat conversations. The model might begin clipping content, and thereby missing important details, if the chat conversation overflows the context window.

The longer the context window, the more a model can process and generate. However, studies have shown that many models with longer context windows still struggle with retaining specific facts from the "middle area" (although the beginning and end of the provided context often still performs well).

Behind the Scenes

Training Data

Training data refers to the collection of text, images, numbers, or other information that an AI model learns from during its development. For many language models, this includes a vast mix of publicly available books, websites, articles, and other written content. By analyzing patterns in this data, the model learns how language works: how questions are asked, how answers are structured, and how different topics are discussed.

The quality and scope of the training data strongly influence how the model behaves. If the data includes a wide range of voices and examples, the model tends to be more useful and balanced. But if certain perspectives are missing, or if the data is biased, those issues can show up in the AI's responses.

For cities, it's important to understand that AI tools aren’t drawing from your internal records unless you specifically connect them (via fine-tuning or other methods). They’re only as informed as the data they were trained on, and often require additional input to make them accurate or relevant to local government work.

Foundation Models

Foundation models are large, versatile AI models trained on massive, general-purpose datasets that can be adapted to perform a wide range of tasks. They serve as a base layer for many different application, like chatbots, image generators, or summarization tools, and can be customized for specific use cases with relatively little additional training. Examples include OpenAI’s GPT models, Google’s Gemini, and Meta’s LLaMA.

What makes foundation models important is their flexibility. Rather than building a new AI system from scratch for every task, cities and organizations can start with a foundation model and layer on specific instructions, prompts, or local data. This allows teams to deploy tools for things like document search, resident Q&A, or code enforcement analysis more quickly and cost-effectively. Understanding foundation models helps city leaders see how one AI platform can support many different services under one roof.

Multimodal Models

Multimodal models are AI systems designed to understand and generate more than one type of input and output, such as text, images, audio, and video, all within the same model. Unlike traditional models that only work with one kind of data (like just text), multimodal models can, for example, read a city zoning map, interpret a caption, and answer a follow-up question about it in plain language.

These models are especially useful in city operations where different types of data need to be connected. A multimodal AI tool could analyze a photo of a damaged sidewalk, match it to a location on a GIS map, and generate a work order with a short description.

As cities increasingly work with diverse forms of data (documents, images, sensor feeds, and more) multimodal models will play a key role in automating complex tasks and improving how information is communicated and understood across departments.

Training an LLM

Training a large language model (LLM) involves feeding it a huge amount of text, like books, websites, articles, and other documents, so it can learn how language works. The model looks at all this information and learns to predict what word (or token) comes next in a sentence based on what came before. Over time, it gets better at understanding grammar, meaning, and context.

This training process requires powerful computers and can take weeks or months. Once trained, the model can be fine-tuned or customized with smaller, specific sets of data (like city records or policy documents) to make it more useful for particular tasks. Think of it like teaching a generalist first, then giving them city-specific knowledge to do the job well.

Fine-Tuning

Fine-tuning is the process of taking a pre-trained large language model and giving it additional, specific training to make it better at a particular task or domain. While the base model has already learned general language patterns from a wide range of public data, fine-tuning teaches it how to behave in a focused way, like writing in a formal city tone, interpreting permitting language, or following local policy rules.

This is done by providing the model with a curated set of examples that show how you want it to respond. For instance, a city might fine-tune a model using hundreds of past council meeting summaries, public notices, or email templates. The result is an AI tool that feels more tailored to your organization, saving staff time and improving consistency across communications.

Custom GPTs

Custom GPTs are personalized versions of ChatGPT that are tailored to specific tasks, audiences, or workflows, without needing advanced programming or full-scale model training. They allow users to adjust the model’s behavior by setting special instructions, uploading relevant documents, and defining how the AI should respond in different situations.

For example, a city might create a custom GPT that acts like a virtual assistant for planning staff, able to answer zoning questions using uploaded ordinances and past case examples. Or one could be built for residents, guiding them through common 311 issues in plain language. Custom GPTs are a practical way to make AI more relevant and helpful for specific government needs, while staying easy to manage and deploy.

Open Source vs Proprietary Models

Open source models are AI systems whose code and sometimes training data are made publicly available. Anyone can inspect, use, modify, or build upon them, often at little or no cost. This makes them transparent and flexible, which can be valuable for cities that want more control over how AI tools are used or want to keep data processing in-house. Popular open source models include Meta’s LLaMA and Mistral. However, you'll have to manage all of the infrastructure and support yourself, often with little more assistance than a few Google searches.

Proprietary models, on the other hand, are developed and owned by private companies like OpenAI (ChatGPT), Anthropic (Claude), or Google (Gemini). These models are typically more advanced and user-friendly but come with licensing fees and limited visibility into how they work. Cities using proprietary models often benefit from better performance and support but give up some control and must rely on the vendor’s terms.

The choice between the two depends on the city’s goals, whether it values ease of use and support, or transparency and customization.

Additionally, proprietary models may not be suitable for handling sensitive or confidential information. Depending on the licensing and end-user agreements, anything you send to a proprietary model may be used for future training. At the very least, that sensitive information may be retained in server logs for some period of time.

Embeddings

Embeddings are a way to represent complex information, like words, images, or even entire documents, as sets of numbers in a mathematical space. These numbers capture the meaning or features of the original item in a way that makes it easier for a computer to compare, search, or analyze them. Items with similar meanings or characteristics end up close together in this space, while unrelated ones are farther apart.

In practical terms, embeddings allow AI systems to understand and work with data more effectively. For example, in a city application, you could use embeddings to help an AI tool find similar zoning cases, match public comments to relevant issues, or group resident service requests by topic, even if the wording is different. They're the backbone of features like semantic search, recommendations, and clustering.

Vector Databases

Vector databases are specialized tools designed to store and search embeddings. Unlike traditional databases that search using keywords or categories, vector databases let AI systems search based on meaning or similarity. They work by organizing data in a way that allows the system to quickly find items that are “close” to each other in embedding space.

This is especially useful in AI-powered applications where you're looking for the most relevant content, not just matching words. For example, in a city context, a vector database could help staff quickly find similar public comments, retrieve past decisions related to a current zoning request, or match incoming emails to the right department based on topic. Vector databases make it possible for AI tools to search more like a human by understanding what something means, not just what it says.

Semantic Search

Semantic search is a method of searching that focuses on the meaning behind a query rather than just matching exact words. Traditional search systems rely on keyword matching. If you type “trash pickup,” they look for documents that include those exact words. Semantic search, on the other hand, uses AI and embeddings to understand the intent of your question and find related information, even if the wording is different.

For cities, semantic search is especially valuable when dealing with large volumes of public records, ordinances, or service requests. It can help staff or residents find relevant policies, past decisions, or meeting notes by asking natural-language questions like “What are the rules for building a fence?” even if the documents use technical phrases like “residential boundary structures.” This makes information more accessible, especially for non-experts.

Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is an AI technique that combines two powerful capabilities: retrieving relevant information and generating responses using that information. Instead of relying only on what the AI model already "knows" from its training data, RAG allows the system to pull in up-to-date or domain-specific content, like city documents, ordinances, or meeting notes, before answering a question or generating text.

Here's how it works: when you ask a question, the system first searches a connected knowledge base (like a vector database or document archive) to retrieve the most relevant pieces of information. It then feeds that content into the prompt, which uses it to craft a response grounded in real, trusted sources. This approach improves accuracy, reduces hallucinations, and makes AI tools more useful for specific tasks, such as answering public inquiries based on local laws, summarizing historical decisions, or supporting staff with on-demand access to city procedures. RAG is especially important in government settings where factual reliability and transparency are critical.

Knowledge Graphs

Knowledge graphs are structured networks of information that represent real-world entities, like people, places, events, or policies, and the relationships between them. Each item in a knowledge graph is a node (such as "City Hall" or "Public Works Department"), and the connections between them (like "oversees" or "reports to") are called edges. Together, these form a map of how information is connected, not just stored.

In a city context, a knowledge graph could link departments, assets, service requests, and regulations to show how they relate. For example, a knowledge graph could connect a street repair to its funding source, the contractor responsible, nearby residents affected, and related council resolutions.

When integrated with AI tools, knowledge graphs help systems reason more accurately, surface relevant information faster, and answer complex questions that depend on understanding relationships, not just facts. They're like a smart, interconnected version of a database that "understands" how your city works.

APIs

APIs, or Application Programming Interfaces, are tools that allow different software systems to communicate with each other. Think of an API as a messenger that carries requests between two programs: one asks for information or an action, and the other responds with data or results. APIs are how apps, databases, and services stay connected behind the scenes.

In the context of AI and city government, APIs are essential for integrating AI tools into existing systems. For example, an AI-powered chatbot might use APIs to pull data from a city’s 311 (service request) system, submit permit requests, or fetch the latest garbage pickup schedule. Instead of building everything from scratch, cities can use APIs to connect smart tools to the systems they already use, making workflows more efficient and services more responsive.

Inference

Inference in AI refers to the process of using a trained model to make predictions, generate responses, or perform tasks based on new input data. While training is about teaching the model how to understand patterns by learning from large datasets, inference is what happens afterward, when the model is actually put to use in the real world.

For example, if a city uses an AI model to detect potholes in street images, inference is the step where the model looks at a new photo and identifies whether a pothole is present. In language models, inference might involve generating a summary of a meeting based on a transcript.

Whether the task is translating text, recognizing faces, or forecasting utility usage, inference is the moment when AI applies what it has learned to deliver practical results. It’s what happens every time someone interacts with an AI-powered tool.

AI Workflows

An AI workflow is the step-by-step process of how an AI system receives input, processes it, and delivers a result, often as part of an automated task or service. It includes all the moving parts: collecting data, preprocessing it, feeding it into a model (or retrieving relevant information), generating a response or prediction, and delivering the output to a user or system. AI workflows can also include logging, human review, or triggers that connect with other systems.

AI workflows are well-defined processes that utilize AI to perform some (or all) of the work.

For example, in a city permit application system, the AI workflow might look like this:

a resident uploads a PDF
the AI extracts and reads the text
it compares the content to zoning rules
it flags potential issues
then it generates a summary for staff review

AI workflows make it possible to embed smart automation into routine processes across departments.

AI Agents

An AI agent is an AI system that acts with a degree of autonomy to complete tasks, often by reasoning, planning, and taking actions across multiple steps or tools. Unlike simple chatbots that only answer questions, AI agents can make decisions, use tools (like calculators or databases), and interact with external systems to accomplish goals.

In a city setting, an AI agent could be designed to handle a resident's request end-to-end, for example, taking a 311 complaint, checking whether it's a duplicate, creating a work order, and sending a follow-up message. Some agents are designed for internal use, like drafting and emailing a weekly update to department heads using data pulled from multiple reports. AI agents represent a more advanced, goal-oriented form of AI, closer to a virtual coworker than a search box.

Ethics, Trust, and Oversight

Bias

Bias in AI refers to systematic and unfair patterns in how an AI system behaves, often leading to unequal or inaccurate outcomes for certain groups of people. This happens when the data used to train the model reflects existing inequalities, stereotypes, or gaps in representation. Since AI learns patterns from historical data, it can unintentionally repeat or amplify biases that already exist in society.

For example, if a model is trained mostly on building permit data from high-income neighborhoods, it may be less accurate or useful when reviewing applications from underserved areas. Or if a public-facing chatbot was trained only on formal legal language, it might struggle to understand or fairly respond to questions written in informal or multilingual ways.

In a city context, AI bias can affect everything from code enforcement to resource allocation to resident communication. That’s why it’s critical to evaluate training data, monitor outcomes, and keep humans in the loop, especially in decisions that impact people’s lives or rights. Addressing bias is both a technical challenge, and a matter of public trust and equity.

Explainability and Confidence Scores

Explainability refers to how well an AI system can describe or justify the decisions it makes. Unlike traditional software, where rules are programmed explicitly, AI models often make predictions based on patterns in data that may not be obvious to humans. Explainability tools aim to bridge that gap by showing which parts of the input influenced the AI’s output or by breaking down the steps it followed to reach a conclusion.

In a city context, explainability is crucial for transparency and accountability. If an AI model recommends denying a permit or prioritizing a service request, staff and residents need to understand why. Was it due to location? Policy? Risk? Clear explanations help build trust in AI tools and ensure that decisions can be reviewed and, if necessary, challenged or corrected. This is especially important in public-facing services or areas with legal implications.

A confidence score is a numerical value that tells you how certain the AI is about its response or prediction. It doesn’t guarantee correctness, but it signals how confident the model is based on what it has learned. For example, an AI might say it’s 92% confident that a document is a public hearing notice, or only 60% sure about how to classify a zoning request.

Confidence scores help city staff know when to trust an AI’s answer and when to double-check. They’re particularly useful in workflows that still involve human oversight, like inspections, document review, or chatbots handling public questions. If the confidence is high, the system might move forward automatically. If it’s low, it can trigger a manual review. In this way, confidence scores make AI tools safer, more reliable, and easier to integrate into real-world decision-making.

Human-in-the-Loop

Human-in-the-loop (HITL) is an approach to using AI where people remain involved in the decision-making process, either by reviewing, approving, correcting, or guiding the AI’s actions. Rather than letting the system operate fully on its own, HITL ensures that a human checks or collaborates with the AI at key points in the workflow.

This is especially important in city government, where decisions often affect real people, public trust, and legal outcomes. For example, an AI system might flag potentially incomplete permit applications, but a staff member makes the final call. Or a chatbot might draft a response to a resident’s complaint, but a human reviews it before it's sent. Human-in-the-loop helps balance the speed and efficiency of automation with the judgment, accountability, and empathy of human oversight.

Data Privacy

Data privacy refers to the protection of personal, sensitive, or confidential information from unauthorized access, use, or sharing, especially when that data is used in AI systems. In the public sector, this includes things like resident contact details, utility usage, property records, health data, and service requests. When cities use AI to automate tasks or analyze data, they must ensure that individual privacy is respected and that data is handled securely and responsibly.

For example, if a city chatbot helps residents with billing questions, it might need access to account information. But it should never expose that data to the wrong person or use it for unrelated purposes. Likewise, if AI is analyzing complaint patterns or permit activity, it should do so in ways that anonymize individual identities where possible. Strong privacy practices also help cities comply with laws and build trust with their communities. This includes clear data policies, secure systems, and transparency about how data is collected, used, and stored in any AI-powered service.

Model Alignment and Testing

Model alignment is the process of ensuring that an AI system behaves in ways that match human values, goals, and expectations, especially in high-stakes or public-facing applications. Alignment goes beyond technical accuracy and looks at whether the model is producing helpful, ethical, and context-appropriate responses. For cities, this might mean making sure an AI tool answers residents respectfully, follows policy guidelines, avoids biased language, and stays within legal or regulatory boundaries.

For example, an AI assistant for zoning questions should not only give correct information but also avoid speculation, respect confidentiality, and defer to staff when uncertain. Alignment helps ensure that the AI supports the public good, reflects the city's mission, and doesn’t unintentionally undermine trust or fairness.

Model testing is the practice of evaluating how well an AI model performs before it's deployed in the real world. This includes testing for accuracy, reliability, fairness, and safety under different conditions. In a city environment, that might involve running sample prompts through a chatbot, checking how it handles sensitive questions, or making sure it doesn’t generate false or misleading responses.

Testing can also include scenario-based checks - like asking how the model responds to complaints, emergencies, or edge cases - as well as continuous monitoring once the model is live. Good testing helps prevent unexpected behavior, reduces the risk of public confusion or harm, and ensures the system can be trusted as part of city operations. It's a key part of any responsible AI implementation strategy.

Hallucination

Hallucination in AI refers to a situation where a model generates information that sounds correct but is actually false, misleading, or made up. This happens because AI models like ChatGPT don’t truly "know" facts. They generate responses based on patterns in the data they were trained on, not verified knowledge.

For example, a model might confidently invent a regulation that doesn’t exist or cite a meeting date that never happened. In a city context, hallucinations can be especially problematic if residents or staff rely on the AI for accurate information about policies, deadlines, or procedures.

That’s why it's important to keep humans in the loop, verify critical outputs, and use tools like retrieval-augmented generation (RAG) to ground AI responses in real, trusted documents. Preventing hallucinations is key to keeping AI tools useful and trustworthy in government settings.

Guardrails

Guardrails are built-in safety measures that help control and limit what an AI system can say or do, reducing the risk of harmful, inappropriate, or misleading outputs. They act like boundaries that keep the model's behavior aligned with its intended use, whether that’s staying on topic, avoiding sensitive content, or refusing to answer certain questions.

In practice, guardrails might include content filters, restrictions on certain types of responses, or rules that prevent the AI from making legal, medical, or financial claims. For city governments, guardrails are essential when deploying AI tools that interact with the public. They help ensure that a chatbot won’t give bad policy advice, escalate a sensitive issue inappropriately, or provide inaccurate information about deadlines or services. Guardrails help make AI safer, more predictable, and more aligned with civic values and responsibilities.

Evaluations

Evaluations in the context of building an app with AI refer to the process of systematically testing and measuring how well the AI performs specific tasks within the application. This includes assessing the quality, accuracy, safety, speed, and consistency of the AI’s responses or actions in real-world scenarios. Evaluations help developers and city teams understand whether the AI is working as intended and identify areas where it may need improvement or added oversight.

For example, if a city builds a chatbot to answer public questions about garbage pickup or zoning rules, evaluations might involve feeding it a range of real resident questions to see how accurate and helpful the answers are. Teams might measure whether the responses match official policy, whether the tone is appropriate, or whether the AI knows when to escalate to a human.

Evaluations are critical for building trust, ensuring legal compliance, and making sure the AI adds real value to users.

About this reference

This is intended to be a living reference document. We'll keep it updated over time as new terms or issues related to AI use in cities arise.