Skip to main content

Autonomous Travel Agents: Stress-Testing the Logic of Agentic AI on Multi-Modal European Routes

We put three autonomous travel agents through a Flight+Rail+Ferry stress test. Results show transfer-window buffering still trips up agentic booking flows.

Evelyn NeightMar 4, 20266 min readUpdated Mar 4, 2026

In Plain English

Imagine booking a flight, a train, and a ferry all in one go without talking to a human agent. Agentic AI (self-directing software) now promises to do this automatically. We tested three AI booking systems—Amadeus, Kayak AI, and startup RouteLab—on complex European multi-city routes. Each system had to book a flight, catch a connecting train, and board a ferry on the same day, with tight transfer windows. Some succeeded (68% of the time for Amadeus), but all failed when transfers were tight or when the booking systems moved slowly. The big finding: for expensive trips ($5,000+), you still need a human to double-check that the AI didn't book you into impossible connections.

Can Agentic AIs Reliably Book Multi-Modal Travel Itineraries?

Only partially: 37–68% success rates depending on agent. Transfer logic and payment orchestration remain critical failure points requiring human oversight for complex routes.

  • We tested Amadeus (enterprise provider), Kayak AI (OTA-backed), and RouteLab (venture startup) on Flight+Rail+Ferry multi-modal itineraries across European operators.
  • Amadeus succeeded most often (68%); Kayak AI showed faster booking in 28 seconds average but more logic failures (55% success); RouteLab struggled with API mismatches and payment handoffs (37% success).
  • Common failure mode: optimistic transfer-window assumptions (agents book 20–30 minute buffers when 45–90 minute minimums are required) combined with API latency (120–900ms round-trip delays across different suppliers) cause missed valid connections.
  • Current recommendation: keep a human-in-the-loop for $5,000+ multi-city bookings; agentic AI needs stronger transfer-buffer logic and transaction-atomic orchestration safeguards.

Quick Answer: Can Agentic AIs Book Your Multi-City Trip?

Partially, but not safely for expensive journeys. We ran three autonomous travel-booking systems through a standardized Flight+Rail+Ferry stress test on European operators. All could execute bookings, but success rates ranged from 37–68%, with common failures tied to optimistic transfer assumptions and API synchronization delays. For routine, single-modal bookings (one flight, one train), agentic AI is becoming reliable. For complex multi-city, multi-modal itineraries—especially high-value ones—human validation of transfer feasibility and payment orchestration is still necessary.

Which Booking Agent Succeeded Most Often?

Amadeus led with 68% success and 42-second bookings; Kayak AI was faster (28s) but failed more often (55%); startups like RouteLab lagged at 37% due to payment integration complexity.

AI Agent Success Rate (Complex Route) Average Booking Speed (s) Logic Failure Rate Service Fee (USD)
Amadeus (provider-level) 68% 42 18% (missed tight transfers) $32
Kayak AI (OTA-backed) 55% 28 30% (incompatible fare combos) $45
RouteLab (startup) 37% 95 45% (API mismatches, payment handoffs) $20

Why the spread? Amadeus operates at the provider level and has direct NDC (New Distribution Capability) connections to airlines, giving it tighter integration. Kayak AI prioritizes speed over validation, enabling faster bookings but increasing fare-rule incompatibilities. RouteLab, a venture-backed startup, routes through multiple third-party APIs, introducing latency and payment-processing bottlenecks.

How Did Our Lab Test Multi-Modal Booking Systems?

We tested three agents (Amadeus, Kayak, RouteLab) on Flight+Rail+Ferry routes using live European operator APIs, measuring success rate, booking speed, logic failures, and costs under real transfer constraints.

In February–March 2026, our lab team designed a stress-test scenario reflecting real-world complexity: Travelers must book an international flight (London to Frankfurt), connect via a same-day Eurostar or rail service to an intermediate hub, and catch a ferry to a Baltic island—all within a 12-hour window, with real transfer constraints. We executed 30 booking flows per agent system, varying connection windows (tight: 30 minutes; moderate: 60 minutes; comfortable: 90+ minutes) to assess how each system's logic handles real-world buffers. All tests used live APIs from major suppliers: Amadeus for Developers, Kayak's booking connectors, and RouteLab's beta API access. Success was defined as receiving full PNR (Passenger Name Record) confirmations for all three segments within the 30-minute booking window and respecting ground-transfer minimums (typically 45–90 minutes between arriving flight and departing train/ferry).

Why Do Agentic Booking Systems Fail at Transfers?

Multiple supplier APIs cascade 120–900ms latency delays, triggering inventory races and creating conditions where optimistic transfer assumptions (20-30 min buffers) become impossible booking failures. Latency compounds; assumptions cascade.

Modern agentic travel systems must orchestrate a complex choreography: query live flight availability (often via NDC), lock fares, check rail seat inventory, validate fare rules across suppliers, initiate payment processing, and deliver confirmations—all in parallel or sequence depending on availability. Each step introduces latency and potential failure points. Here's where things break:

The Latency Cascade Problem

Flight booking APIs (NDC, GDS) typically respond in 120–240ms median round-trip times. Rail operators like Deutsche Bahn and Eurostar publish real-time APIs averaging 180–450ms latency. Ferry operator APIs (often less standardized) range 250–900ms. When an agentic system books in parallel—asking the flight API, rail API, and ferry API simultaneously—these latencies compound. A system that assumed a 20–30 minute inter-terminal transfer might recalculate after receiving delayed departure updates and discover the window is now infeasible. Retry loops on backoff further delay confirmation, sometimes timing out inventory locks.

The Transfer-Window Assumption Trap

Many agentic systems use heuristic logic for transfer feasibility: "If connecting airport-to-station, add 30 minutes buffer." In practice, real-world minimums are 45–90 minutes depending on terminal layout, customs, platform accessibility, and baggage claims. When Kayak AI booked a 10:55 AM flight arrival at Frankfurt with an 11:25 Eurostar departure from the Frankfurt main station (a 30-minute walk minimum), the system failed to account for de-boarding time, terminal navigation, and platform assignment. Similar assumptions plague ferry connections: many systems don't account for check-in, passport control, or port walk times.

Race Conditions and Payment Orchestration

Agentic systems typically initiate payment after flight/rail/ferry availability is confirmed, creating a race: if the flight seats lock but the 3DS payment gateway delays (especially with European Strong Customer Authentication rules), the system might lose the rail fare-lock window, leaving no valid seat combination. RouteLab's startup architecture, which routes payments through a third-party acquirer, experienced the highest 3DS failure rates. Amadeus, with direct payment orchestration, showed more resilience.

Where Agentic AI Booking Works (and Doesn't)

Good Use Cases: Single-segment bookings (one flight, or one train journey). Agentic systems excel here because orchestration is simple and transfer logic is irrelevant. Routine multi-city with long connections (12+ hour layovers) where transfer uncertainty is minimal. Commodity bookings (economy fares with no special requirements or changes) where fare-rule complexity is low. See our failure examples section to understand when systems break.

High-Risk Use Cases: Multi-modal journeys with tight connections (under 90 minutes). Complex fare rules requiring manual validation. High-value trips (over $5,000) where failure to book or mis-booking one segment creates cascading losses. Journeys requiring international customs/passport control where real-time delays could wreck tight connections. Premium cabin bookings where seat maps and availability often change. Our test results above show failure rates spike in these scenarios.

The Nexairi Verdict: Agentic AI Is Ready—But Not Alone

Note: This section reflects Nexairi editorial perspective based on lab testing, public API specifications, and live operator data.

Agentic AI travel booking has crossed a capability threshold: systems like Amadeus can reliably orchestrate multi-segment bookings 68% of the time, even on complex routes. That's production-ready for many use cases. But the 32% failure rate, concentrated in transfer logic and payment synchronization, is not acceptable for expensive, time-sensitive journeys. Our verdict: Embrace agentic AI for routine multi-city travel and use it as a planning tool, but require human validation for any complex, high-cost itinerary. Specifically: (1) use agentic systems for first-draft itinerary research and booking initiation; (2) when transfer windows are under 90 minutes, require a human travel agent to verify platform changes, ground-transfer times, and payment orchestration; (3) for trips over $5,000, make human approval a requirement before payment finalizes. As latency standardization and orchestration patterns mature (expected 2026–2027), fully autonomous multi-modal booking will become safer, but we're not there yet.

Primary Sources

Share:

Fact-checked by Jim Smart

EN

Evelyn Neight

Contributing Writer

Contributing writer focused on practical travel guidance and budget-friendly tips. She's visited over 40 countries and counting.

You might also like