23.09.2025: URA AT NVIDIA AI Day Ho Chi Minh City 2025

A full day at Sheraton Saigon (09/23/2025) showed that AI in Vietnam has shifted: from technical demos to standardized pipelines, inference cost optimization, and in-depth discussions about “sovereign AI,” along with the things the URA team learned.

On 09/23/2025, members of the URA research group attended NVIDIA AI Day at Sheraton Saigon Grand Opera. The program ran all day at a moderate pace, interleaving:

Keynotes/technical sessions: updates on agentic AI, AI factory, inference optimization, and observability.
Hands-on/Lab (DLI): learn-by-doing, closely tied to real scenarios.
Demo & networking area: where engineering teams, startups, and infrastructure providers shared “fix the pain you actually have” lessons.

What impressed us most was the “do it for real” spirit: from data governance and model selection to measuring cost/latency in production—less talk of ideas, more concrete ways of doing.

What the team learned

Leaving the conference room, the whole team felt we learned a lot—and most notably, a new way of looking at agentic AI: it’s not just a chatbot that answers better, but a system that can plan, call the right tools, and watch for risks on its own. When pieces like Planner, Retriever/Memory (RAG + vector store), Tooling, Guardrails/Policy, and Tracing/Observability are put together, the reasoning → action flow becomes transparent and controllable—very different from previously “guessing what the model was doing.”
At the same time, we absorbed the idea of packaging models as services: using NIM/microservices to standardize APIs and bring inference into CI/CD like other services. Separating the model layer from the application, tracking cost per 1,000 requests, and reusing building blocks (LLM/vision/speech) helps clearly reduce technical debt—especially when model versions need to be swapped or when A/B testing different configurations/prompts.

Finally, the Sovereign AI discussion was very direct: to build AI for Vietnamese data, you must control your data and infrastructure. From classifying data (public/sensitive/strict) to on-prem training/fine-tuning with auditable pipelines, then multi-stage deployment (dev–staging–prod) with an “internet-off” option when needed—all of this forms a serious operational framework so AI can go into production safely and in compliance.

Team Spotlight

Appearing in the talk “Building AI Factories for the Next Industrial Revolution” by Dr. Ettikan Karuppiah (Regional Chief Technology Officer for South Asia–Pacific, NVIDIA) at the conference, the URASys (Unified Retrieval Agent-Based System) developed by the research group at Ho Chi Minh City University of Technology was mentioned as an illustrative case of the AI agent application trend in Vietnam.

URASys was the only project authored by students and graduate students introduced in this presentation, alongside two other projects from industry. This is not only a great honor for the team—having their research recognized and presented widely—but also opens up many promising opportunities for connection, collaboration, and development for students, graduate students, and faculty of Ho Chi Minh City University of Technology in the field of artificial intelligence.
URASys was developed by a research group at Ho Chi Minh City University of Technology, consisting of two third-year students Nguyễn Song Thien Long and Vo Thi Nhu Quynh, and two master’s students Le Hoang Anh Tai and Huynh Tieu Phung, under the scientific supervision of Assoc. Prof. Dr. Quan Thanh Tho, Head of the Faculty of Computer Science and Computer Engineering.

This is a multi-agent AI architecture that combines Retrieval-Augmented Generation (RAG) to perform multiple tasks—from orchestration and search to retrieval and answer generation. The system has the following key features:
Manager Agent: analyzes and breaks down questions, orchestrating other agents.

FAQ Agent + Document Agent: two parallel retrieval pipelines—FAQ provides a database of frequently asked questions, while the Document Agent retrieves from official textual documents.

RAG – Retrieval-Augmented Generation: when a user asks a question, the system first retrieves relevant text passages from the database, then feeds these passages to the AI model to generate an answer.

“Just Enough” principle: only returns results when sufficient necessary evidence has been gathered; if the query is ambiguous, the system asks the user for clarification; if no data is available, it responds unanswerable (no information found).

Two-stage indexing: Chunk-and-Title (creating passages rich in keywords and semantics) and Ask-and-Augment (creating multiple question–answer paraphrase pairs), which optimizes retrieval effectiveness, especially in Vietnamese.

Leaving NVIDIA AI Day HCMC 2025, we are even more convinced that the gap between a “great demo” and a “good product” lies in engineering discipline: standardized pipelines, serious data governance, transparent measurement, and choosing the right infrastructure “bricks.” With what we learned, the URA team is confident about moving from POC to production faster and more reliably.

23.09.2025: URA AT NVIDIA AI Day Ho Chi Minh City 2025

Thích điều này:

Join the conversation Hủy

23.09.2025: URA AT NVIDIA AI Day Ho Chi Minh City 2025

Chia sẻ:

Thích điều này:

Join the conversation Hủy