Modernizing online travel experiences is a huge engineering challenge, here’s how Sabre tackled it head on.
By Suresh Vellanki, Senior Vice President of Software Engineering, Sabre
The online travel industry is undergoing a significant transformation, as it looks to give travelers more frictionless, intuitive, and personalized experiences. Delivering this is particularly difficult for a travel industry which remains a complex web of stakeholders with systems, code, and data standards that were sometimes designed decades ago.
Online travel agencies (OTAs) in particular, deeply understand the need for modernization as they operate in a highly competitive market, facing constant pressure to scale and innovate to outpace incumbents and new entrants. But they can’t do this alone. OTAs rely on the experience of tech providers to complement internal expertise.
To meet this challenge head on at Sabre, a global travel technology provider, we committed to our own large-scale project in 2019 to address legacy code, modernize systems, and give OTAs the strong technological foundation and intelligent solutions they need.
Here are some lessons from that journey on how to set objectives and priorities; review approaches to tackle legacy code; think about people, process, and tech considerations; and continue innovating through the modernization process.
Free guide for engineering leaders
- Understand why it’s important to address legacy code and learn steps on how to do this
- Focus on the positive outcomes and potential benefits that can be achieved by doing this
- Enable your engineering org to be more agile and proactive in the long run
Setting legacy code objectives and priorities
When we started setting objectives for this transformation project, we needed to acknowledge the historical context of our legacy code. This helped us keep a positive and productive frame of mind as legacy code is often seen as old, not very useful, and difficult to manage.
However, before this code was labeled as ‘legacy’, it was generating functional and commercial value for the company, and many of these systems still provide considerable value today. Deciding what was legacy and had to be transformed, therefore, was an important part of determining the project scope.
Given the size of our codebase – which powers several hundred applications for our customers – the scale of our effort involved several thousand engineers and thousands of customer migrations over a number of years.
Here are three key factors, and some questions we considered when setting these objectives, determining scope, and prioritizing which legacy systems to tackle first:
1. Data processing and operational costs
- How much does it cost to maintain legacy code: infrastructure, tooling, and engineering time?
- How reliable are legacy systems and how does this compare to modern systems?
- What are the annualized downtime recovery and scalability costs?
2. Engineering resources and skills gaps
- What documentation and training resources are available for engineers to learn about legacy codebases?
- What skills and knowledge gaps exist that need filling?
- What is the split of engineering time spent on maintenance vs new product development?
3. Legacy code’s value to the business
- How much revenue do legacy systems generate or support, directly or indirectly?
- Do business leaders understand the technical realities of legacy systems and modernization efforts?
- Can legacy systems meet the needs of existing and upcoming customer service level agreements (SLAs) and revenue goals?
- What are the technical dependencies with internal and third-party systems?
Reviewing approaches to address legacy code
During objective setting, we had to decide which approaches were most appropriate. We thought about upfront and ongoing development costs, our five-to-ten-year strategy, the required sequence and speed of system modernization, and the cost of doing it ourselves compared to contracting development or buying a new solution off the shelf.
We reviewed tactics based on experience, publicly available frameworks (such as Gartner’s 5Rs of cloud migration), input from external development vendors, and insights from our strategic partnership with Google Cloud. These tactics included introducing an API layer, updating or rebuilding systems, buying new software, and building new solutions from scratch:
- Retain through encapsulating via APIs: Preserving the legacy system’s functionalities and making it accessible through a new API layer, allowing interaction with modern systems without exposing legacy complexities.
- Refactor: Restructuring existing code to improve its readability and efficiency, without changing its external behavior.
- Revise: Making incremental changes or updates to a legacy system to improve its performance, maintainability, or compatibility with newer technologies, without a complete overhaul.
- Rebuild: Recreating or redesigning a legacy system from the ground up, often in a new technology stack, while preserving its original scope and functionality.
- Replace: Completely substituting a legacy system or component with a new solution that better meets current functional, technological, and business requirements.
- Replatform/rehost: Moving an existing application to a new runtime environment or hosting platform, often to leverage better performance, scalability, or cost effectiveness, without making significant changes to the application’s core architecture or code.
- Integrating commercial off-the-shelf software: Incorporating pre-built software from a third party into the existing system to enhance or replace specific functionalities, reducing the need for custom development.
- Build new/next gen: Developing a brand-new solution with modern engineering practices and technologies to replace the legacy system or functionality
Some of the legacy code and technical debt drivers for this initiative included older versions of Java, high compute cost functions and maintainability on mainframe, the cost of operating old physical data centers, and a lack of autoscaling and automated security scans.
Due to the broad scope, we decided to focus primarily on replatforming/rehosting important high-value systems in the cloud for the fastest return on investment. In parallel, we employed various approaches, including revising and refactoring, for lower priority systems that required low effort to quickly deliver a lot of value.
Modernizing people, process, and tech
Once we had set objectives and identified the best approaches, we carefully planned how to manage the project at scale. The systems due for modernization had complex dependencies and powered in-production products used by airlines, hotels, and travel agencies, serving millions of customers every day.
Our plan had to consider people, process, and technology needs to help us successfully achieve executive buy-in.
Technology transformation projects often fail because they do not adequately consider or plan for the people and skills required to carry out complex software projects.
We therefore focused heavily on training people with the skills and knowledge they needed for this project and for their future careers in a modern engineering organization. Training included courses on microservices, Google Cloud certification, and engineering productivity tools. In areas with skill gaps that we couldn’t fill ourselves, we collaborated with third-party development vendors and Google engineers to augment training and implementation.
We also updated some organizational structures to clarify and streamline roles and responsibilities. For example, we created a tech transformation project management office (PMO) to track, report progress, identify risks, and measure the journey across multiple workstreams.
Another example was our Center of Excellence (CoE) teams. Here, engineers and leaders with specialized knowledge and skill sets were grouped (e.g. cloud build, database solutions, customer migration) to provide support and educate others. This helped create a culture of collaboration and made it easy for engineers to know who to contact for questions on specific topics.
Consistent, predictable, and measurable processes were crucial throughout this process. We implemented strong governance through steering committees with clear remits to improve decision making. For example, an early decision was made to deploy in a multi-zone single region cloud configuration before a phased approach to multi-region cloud deployments. This provided the same high level of availability as disaster recovery capabilities from physical datacenters and allowed us to migrate faster, achieve cost savings, and prove the migration solution at a smaller scale early in the modernization process.
We also had to carefully consider network access to development, certification, integration, and production environments with Payment Card Industry Data Security Standard (PCI DSS) and Personally Identifiable Information (PII) zones and access restrictions required for compliance as required by Sabre’s mission critical applications.
By concentrating decision making in a committee with relevant migration expertise, it prevented the team from getting blocked and helped us make the most appropriate decision.
We also introduced or updated documentation throughout the process to unify our understanding of systems and processes, leading to more efficient collaboration and streamlined project execution. This was in addition to new and updated metrics to consistently measure performance using Agile and DORA metrics to track progress across workstreams.
Next, we needed the right technology and tools to tackle our legacy code. That started with new and updated developer productivity tools, including common Continuous Integration/Continuous Delivery pipeline frameworks, and consistent integrated development environment (IDE) configuration with automated updates.
We also (re-)introduced various architecture patterns and development practices to improve our system observability, increase delivery consistency, and minimize the risk of downtime:
- Pub/Sub (Publisher/Subscriber) enabled services to communicate asynchronously which improved flexibility and avoided delayed messages.
- Event sourcing improved observability with an immutable, append-only store of data states.
- Strangler pattern lowered the risk of downtime during migration by introducing a façade in front of legacy systems. This allowed for incremental development work and a seamless transfer as new systems were introduced.
- Sharding involved partitioning databases into smaller parts for faster, more scalable, and lower costs data processing.
Reaping the rewards and innovating in parallel
Ultimately, we implemented a combination of activities across our tech stack and engineering organization to address multiple legacy systems. The effort has been worth it.
- We moved 93% of compute capacity from physical data centers to the public cloud, resulting in a 50% decrease in our compute costs.
- We migrated more than 120 million bookings from a mainframe-based system to one using Google Cloud’s Spanner database, without impacting customer operations.
- And we modernized siloed data stores by migrating them to Google’s Big Query data lake product, lowering costs, and improving security and data management in the process.
As systems were modernized, it presented opportunities to try new technologies that were not previously available to us. Having data stored in more modern systems has enabled Sabre to use the Google Vertex AI platform to build AI-enabled products that increase revenue and operating margins for our customers, such as Air Price IQ™ to personalize flight prices, and Ancillary IQ™ to sell ancillary products based on market and traveler context.
Maintaining or modernizing legacy code and innovating with new products isn’t an either/or decision. We need to do both in parallel to stay competitive and deliver value for customers. Most importantly, using data moved to the cloud, we developed and deployed Travel AI microservices across application suites.
For example, we recently developed a hybrid capability to integrate AI-driven decision making into a check-in system due for modernization that re-accommodates passengers when flights are disrupted. Deploying such new capabilities while the older system was running in parallel demonstrated to the business that we could extend the scalability and intelligence of legacy code while carrying out modernization activities.
Similarly, we were able to infuse generative AI into our existing tooling, allowing engineers to triage issues faster by cross-referencing context sensitive information from different sources using natural language, significantly lowering our mean time to repair (MTTR) during system incidents.
Thanks to careful consideration in the planning stage, executive buy-in, and close collaboration with trusted third parties, we were able to kick-off one of the largest tech transformation projects in the travel industry. I hope some of the lessons we learned are helpful in your own journeys to tackle legacy code and modernize your systems for business impact.
About the Author
Suresh Vellanki is Senior Vice President of Software Engineering at Sabre and leads product development for Commercial Products, Fulfillment and Data & Analytics solutions. In this role, he led Sabre’s Cloud Migration and Mainframe Offload technology transformation initiatives, as well as moving Sabre’s Data & Analytics platform to Google Cloud. He also exerts engineering and operational rigor into the company’s key Personalization initiatives, including Intelligent Retailing and New Distribution Capability (NDC).
We partnered with LeadDev, the global engineering leadership community, to discuss how enterprises can overcome the challenges of legacy code.