Back to Blog
Backend
distributed-systems
system-design
migration
backend-engineering
engineering-under-pressure

The Night Everything Went Wrong (A Real Migration Crisis)

A story from the trenches of backend engineering: an overnight migration under pressure, impossible timelines, and the decisions that saved a business-critical customer.

Published: February 14, 2026
4 min read
Share:TwitterLinkedIn

Most migrations fail slowly.
This one almost failed overnight.

By the time we had migrated several customers successfully, we had developed a false sense of confidence. The scripts were stable. The process was repeatable. The unknowns felt manageable.

Then came the biggest customer.

They were large — both in terms of data volume and commercial importance.
Their migration required days of preparation, including pre-running scripts to move historical data before the final cutover.

I had planned to start their preparation in two days.

Then something unexpected happened.


The Unplanned Customer

The product team informed us that another customer — much smaller in size — had to be migrated the very next day.

Postponing wasn’t an option.

So we proceeded with the smaller migration, completed it successfully, and by the time it ended, it was already late evening.

Around 8–9 PM, I started preparing the migration for the big customer.

That’s when everything broke.

The same scripts that had worked reliably for all previous customers started failing with bizarre, unexplained errors.

  • No clear logs
  • No obvious root cause
  • No deterministic reproduction

And to make things worse, this customer had a custom requirement.

They didn’t want a clean migration.

They wanted us to merge data from both platforms — old and new — into a single organization.
Our migration tooling was never designed for this. It assumed a clean org creation with fresh data.

This meant:

  • Additional transformation logic
  • Manual reconciliation steps
  • A completely different execution flow

Then the data dump failed.

Repeatedly.

By 4 AM, nothing had worked.

By 6 AM, I had partially stabilized the scripts, but it was clear:

It was mathematically impossible to complete this migration safely.

One of the scripts alone would take ~8 hours

We had 20+ scripts to execute.

I requested the product team to postpone the migration.

They refused.


When Engineering Meets Business Reality

The product team made the situation explicit:

  • This customer was large
  • They were actively evaluating our biggest competitor
  • Delaying migration could lead to churn

The decision escalated quickly.

On the call were:

  • Our CEO
  • Product Head
  • Me and another engineer

The question wasn’t technical anymore.

It was existential:

Do we attempt an impossible migration, or risk losing the customer?

We chose to attempt it.

But we changed the strategy.


Redesigning the Migration Under Fire

We made a critical decision.

Instead of migrating everything, we would:

  • Prioritize recent and active tickets
  • Ensure minimal downtime for ongoing workflows
  • Accept that historical data might lag temporarily
  • Communicate this transparently to the customer

Technically, this meant:

  • Rewriting migration order
  • Reducing scope
  • Re-optimizing the slowest scripts
  • Accepting controlled inconsistency instead of perfect completeness

Then came the hardest part.

One script that originally took 8 hours was the main blocker.

I rethought its logic under pressure:

  • Reduced unnecessary joins
  • Changed data batching strategy
  • Removed redundant transformations
  • Optimized DB interactions

The result?

It completed in an hour.

Not 8 hours.
An hour.


Execution Under Uncertainty

Even after optimization, things didn’t go smoothly.

We faced:

  • Unexpected edge cases
  • Connector failures
  • Partial migrations
  • Live traffic interference

But by this point, we had accepted a fundamental truth:

Perfection was impossible. Progress was mandatory.

We kept adapting.
We kept fixing.
We kept migrating.

And against all odds, we completed the migration within the time window.

The customer stayed.
The platform held.
The business survived.


Why This Was the Hardest Part of the Migration Project

Technically, the migration was complex.

But emotionally and cognitively, this night was harder.

It forced me to confront realities that architecture diagrams never show:

  • Systems don’t fail in isolation — they fail under business pressure
  • Technical correctness often competes with commercial urgency
  • Engineers are sometimes asked to solve problems that are mathematically impossible — and still expected to try
  • The best solution is not always the cleanest, but the most survivable

That night taught me something fundamental:

In real-world systems, engineering is not about building perfect systems.
It’s about making imperfect systems work when they absolutely must.

Harsh Mange

Written by Harsh Mange

Software Engineer passionate about building scalable backend systems and sharing knowledge through writing.

Share:TwitterLinkedIn