Transforming Trust: The Essential Role of Reliable Evaluations in AI and Language Models

January 7, 2026

Lu Wang
Dr. Lu Wang

How can we ensure that artificial intelligence tools are trustworthy and reliable? Lu Wang, an associate professor of computer science and engineering at the University of Michigan, was ready to answer that at the November 2025 Dean’s Spotlight Series talk.

The talk, sponsored by SCI’s Intelligent Systems Program, brought together the program’s experts on AI as well as other Pitt community members to explore how to take action to ensure the reliability of tech that has quickly become part of everyday life.

“When these systems are inaccurate, unsafe, or biased, the consequences scale quickly, from spreading misinformation to reinforcing inequities or producing harmful actions through autonomous agents,” said Wang about large language models (LLMs) and other AI tools.

According to Wang’s research, many current evaluations of LLMs fall short, suffering from data contamination, focusing too narrowly on short question answering, or generally failing to reflect the complexity and reasoning required in real-world settings.  

But that’s where researchers like Wang and SCI graduates can bring their expertise to address a gap in inquiry.

“We need evaluation methods that are more dynamic, scenario-based, and aligned with actual user tasks and contexts,” Wang explained. “Prioritizing evaluation in these areas is essential for ensuring that AI systems behave responsibly not just in the lab, but wherever they are deployed.”

According to Wang, what counts as trustworthy should evolve with the technology itself, as “new technologies reshape how information is produced, delivered, and used”. Trust, Wang said, is no longer just about accuracy with LLMs, it’s about transparency, robustness to manipulation, alignment with human values, predictable behavior in complex situations.

Through the Intelligent Systems Program, researchers at SCI and across the University are coming together to address this technology in various fields and ways, preparing students to think about AI in an evolving world.

Wang’s research into LLMs began through her work with natural language processing. After seeing how LLMS that were trained on vast amounts of data could understand, generate, and reason, Wang was both inspired and transformed.

“Expectations rise as tools become more powerful, and society demands clearer evidence that systems will act responsibly at scale,” Wang said. “The progress of technologies doesn’t just change what we can do with them, it also changes what we believe they must do to earn and maintain trust from humans.”

As for the broader impact of AI and LLMs, the field requires those who both understand the technology and are dedicated to keeping it trustworthy, ethical, and reliable.

“AI systems increasingly shape the information we see and the decisions we make,” said Wang. “Knowing their underlying mechanisms and limitations helps us interpret their outputs responsibly and choose the right model for the right task.”

View a recording Wang’s talk, "Building Trustworthy Large Language Models: Methods and Evaluation".