Content Migration

Recovery operation, not copy-paste. Most existing content has more value than the current system can extract. Migration is the operation that recovers it — converting legacy estates into reusable assets at scale.

What gets delivered.

Content audit and inventory: Volume, formats, complexity, reuse potential. Decisions about what gets migrated, what gets retired, what gets rewritten.
Conversion scripting: XSLT pipelines, custom converters, off-the-shelf conversion tools where appropriate.
Normalization and metadata enrichment: Cleanup that brings inconsistent legacy content to a single quality bar in the target schema.
QA harnesses: Validation rules, link checking, structural integrity, content equivalence verification.
Cutover and parallel-running plans: How the new system goes live without breaking the old one. Rollback procedures if it does.
Author transition support: Training and runbook handover. The migration ships when authors are operating the new system, not when the conversion finishes.

Outcomes.

45%: Content reuse rate after migration Average across migrations from unstructured legacy formats to clean DITA.

72%

Translation cost reduction in the first localized release post-migration

60%

Faster publishing cycles after the new pipeline lands

Recovery rate is measured per-engagement against the audited source estate. Translation savings and publishing speed are first-year effects after the new pipeline replaces the legacy chain.

Migration isn't moving content — it's identifying recoverable reusable assets. The 45% recovery rate is what makes downstream translation savings possible. When content is recovered as reusable topics rather than rewritten chapter by chapter, localization volume drops proportionally and every subsequent change costs less.

Recent engagements.

A 4,000-page FrameMaker estate moved to DITA.

Format-specific MIF parser plus paragraph-level hash matching across the full estate. 38% of topics consolidated into a warehouse with conref pointers from the originals. First localized release after cutover dropped translation cost 51%; the engineering paid for itself on that release alone.
A 15-year Confluence migration to typed DITA.

REST API extraction plus macro-to-conref mappings preserved the link graph and cross-reference structure. Heading-style consistency rebuilt where the source had drifted across three platform versions. The new repository launched with a single authoring path; the Confluence space was retired six weeks later.

Anonymized for client confidentiality. Specific scope, contract details, and named outcomes available under appropriate NDA channels.

Standards and tooling.

Source formats: FrameMaker (.fm, .mif), unstructured Word, Adobe InDesign, legacy CCMS exports, unstructured or partially-structured XML.
Target formats: DITA 1.3, S1000D Issue 5/6, custom XML schemas where required.
Conversion tooling: XSLT 2.0/3.0 pipelines, custom Python or Node converters, off-the-shelf tools (mif2dita, MadCap converters) where appropriate.
QA harnesses: Schematron validation, custom link-check and structural validators, content-equivalence verification.
Containerized builds: Docker-based DITA-OT for repeatable conversion runs across environments.

When this goes wrong.

WHEN MIGRATION IS BOTCHED

Failed cutovers leave content debt that never gets paid down.

Migrations that lose fidelity. 'We'll clean it up later' debt that never gets cleaned up. Conversion projects that produce structured-looking content with the same authoring problems as the legacy system — now in a new format. The pattern is consistent: the architecture work was skipped or rushed, and the conversion ran against an unspecified target.

When you’d engage us here.

The migration scope keeps expanding mid-project.

The audit was scoped to one estate; the work surfaced three. Until the inventory is wall-to-wall, scope expansion is the rule, not the exception.
Your current vendor is rekeying instead of parsing.

Hand-rekeying is how migrations never finish. If the conversion plan is a typing pool, the implicit hierarchies in the source disappear — and the new repository inherits the same authoring problems the legacy system had.
Your translation memory doesn't match the new system's segments.

Migration broke the TM link without anyone quantifying it before cutover. Every subsequent localization release pays full translation cost on content that was already translated.
You're three months past cutover and still running the old system in parallel.

The conversion was wrong against the target before it ran. Usually the architecture work was rushed; the parallel-running is paying down content debt that should have been resolved in Migration.

Read the Content Migration narrative

Sample Content Assessment

Submit a 20-page sample. We'll return a migration feasibility assessment — recovery potential, conversion effort, and the architecture decisions that would shape a production migration. Two business days, no obligation to proceed.

Submit a sample →