Portfolio | Harsh Mange

Most migrations fail slowly.
This one almost failed overnight.

By the time we had migrated several customers successfully, we had developed a false sense of confidence. The scripts were stable. The process was repeatable. The unknowns felt manageable.

Then came the biggest customer.

They were large — both in terms of data volume and commercial importance.
Their migration required days of preparation, including pre-running scripts to move historical data before the final cutover.

I had planned to start their preparation in two days.

Then something unexpected happened.

The Unplanned Customer

The product team informed us that another customer — much smaller in size — had to be migrated the very next day.

Postponing wasn’t an option.

So we proceeded with the smaller migration, completed it successfully, and by the time it ended, it was already late evening.

Around 8–9 PM, I started preparing the migration for the big customer.

That’s when everything broke.

The same scripts that had worked reliably for all previous customers started failing with bizarre, unexplained errors.

No clear logs
No obvious root cause
No deterministic reproduction

And to make things worse, this customer had a custom requirement.

They didn’t want a clean migration.

They wanted us to merge data from both platforms — old and new — into a single organization.
Our migration tooling was never designed for this. It assumed a clean org creation with fresh data.

This meant:

Additional transformation logic
Manual reconciliation steps
A completely different execution flow

Then the data dump failed.

Repeatedly.

By 4 AM, nothing had worked.

By 6 AM, I had partially stabilized the scripts, but it was clear:

It was mathematically impossible to complete this migration safely.

One of the scripts alone would take ~8 hours

We had 20+ scripts to execute.

I requested the product team to postpone the migration.

They refused.

When Engineering Meets Business Reality

The product team made the situation explicit:

This customer was large
They were actively evaluating our biggest competitor
Delaying migration could lead to churn

The decision escalated quickly.

On the call were:

Our CEO
Product Head
Me and another engineer

The question wasn’t technical anymore.

It was existential:

Do we attempt an impossible migration, or risk losing the customer?

We chose to attempt it.

But we changed the strategy.

Redesigning the Migration Under Fire

We made a critical decision.

Instead of migrating everything, we would:

Prioritize recent and active tickets
Ensure minimal downtime for ongoing workflows
Accept that historical data might lag temporarily
Communicate this transparently to the customer

Technically, this meant:

Rewriting migration order
Reducing scope
Re-optimizing the slowest scripts
Accepting controlled inconsistency instead of perfect completeness

Then came the hardest part.

One script that originally took 8 hours was the main blocker.

I rethought its logic under pressure:

Reduced unnecessary joins
Changed data batching strategy
Removed redundant transformations
Optimized DB interactions

The result?

It completed in an hour.

Not 8 hours.
An hour.

Execution Under Uncertainty

Even after optimization, things didn’t go smoothly.

We faced:

Unexpected edge cases
Connector failures
Partial migrations
Live traffic interference

But by this point, we had accepted a fundamental truth:

Perfection was impossible. Progress was mandatory.

We kept adapting.
We kept fixing.
We kept migrating.

And against all odds, we completed the migration within the time window.

The customer stayed.
The platform held.
The business survived.

Why This Was the Hardest Part of the Migration Project

Technically, the migration was complex.

But emotionally and cognitively, this night was harder.

It forced me to confront realities that architecture diagrams never show:

Systems don’t fail in isolation — they fail under business pressure
Technical correctness often competes with commercial urgency
Engineers are sometimes asked to solve problems that are mathematically impossible — and still expected to try
The best solution is not always the cleanest, but the most survivable

That night taught me something fundamental:

In real-world systems, engineering is not about building perfect systems.
It’s about making imperfect systems work when they absolutely must.

The Night Everything Went Wrong (A Real Migration Crisis)

The Unplanned Customer

When Engineering Meets Business Reality

Redesigning the Migration Under Fire

Execution Under Uncertainty

Why This Was the Hardest Part of the Migration Project

Written by Harsh Mange

Related Posts

Building Chronon: A Distributed Rate Limiter

Building Parcelo: A Distributed Job Scheduler That Actually Makes Sense

Building a Cloud IDE from Scratch: Architecting 'Just Run It' with Kubernetes, WebSockets, and Real-Time Terminals