← Case studies

CASE 01 / 03 2024 Consulting practice

Enterprise DITA Training Curriculum

From a blank repository to a 76-topic, 10-part progressive learning program — authored, enriched, and published entirely in DITA.

Duration
16 weeks
Team
1 lead architect · 2 content engineers · 1 publishing engineer
Engagement
Project-based, fixed scope
Status
Shipped · in production across three learner audiences

Challenge and approach

The challenge

A growing consulting practice needed a comprehensive, hands-on DITA training program that could serve three distinct audiences — technical writers, software engineers, and content strategists — with different learning paths, difficulty levels, and time commitments. Existing materials were scattered across Markdown files, slide decks, and tribal knowledge with no reuse, no metadata, and no structured publishing pipeline.

The approach

We designed a 10-part bookmap architecture covering the full content lifecycle: from 'Why Intelligent Content Matters' through authoring, information architecture, metadata enrichment, publishing automation, DITA-OT development, XSLT customization, and AI pipeline integration. Every topic was formally typed (concept, task, or reference), enriched with 10+ metadata fields, and validated against a controlled vocabulary subject scheme.

Artifact Ledger


  • 76 DITA topics
  • 10 learning parts
  • 12 hands-on labs
  • 218 PDF pages

Stack


Schema
DITA 1.3 · custom subject scheme
Authoring
Oxygen XML Editor · Schematron rules
Tooling
Python 3.12 (metadata enrichment)
Publishing
DITA-OT 4.2 · Apache FOP 2.9 · XSL-FO
Build
Containerized shell scripts · single-command HTML5 + PDF

What we delivered.

Content architecture

Master bookmap with 10 submaps, a shared resource library, and four appendices. Three role-based learning paths with phase checkpoints and time estimates — 25 to 50 hours depending on role.

Before / after samples

Five unstructured XML files transformed into six fully typed DITA topics — demonstrating content splitting, semantic markup, and metadata enrichment at every step of the conversion.

Assessment materials

Eighty interview questions with reference answers organized by skill level. A structured learner study guide with five phases, prerequisites, and a vocabulary glossary for onboarding.

Curriculum coverage


  1. 01 Why Intelligent Content
  2. 02 Content Preparation
  3. 03 Authoring
  4. 04 Information Architecture
  5. 05 Metadata Enrichment
  6. 06 Publishing Automation
  7. 07 DITA-OT Development
  8. 08 XSLT Customization
  9. 09 AI Pipeline Integration
  10. 10 Benefits & Impact Measurement

Decisions and trade-offs.

The choices that shaped the engagement, recorded with the option taken and what was rejected. The reasoning matters more than the outcome.

  1. Topic typing

    Chosen

    Formal DITA concept / task / reference

    Rejected

    Generic <article> / <section> structure

    Why: Formal types let authors and tooling distinguish 'what is X' from 'how to do X' — essential for learner-path routing and downstream AI retrieval, but also for human navigability across a 76-topic corpus.

  2. Bookmap organization

    Chosen

    10 thematic parts (progressive curriculum)

    Rejected

    Flat topic library indexed by tags

    Why: Curriculum needs sequence dependency and prerequisite expression; a flat library can't carry phase-by-phase ordering without manual cross-references that drift the moment the content is edited.

  3. Metadata strategy

    Chosen

    Controlled-vocabulary subject scheme

    Rejected

    Free-text tags applied at author discretion

    Why: Ad-hoc tags accrete inconsistent values within months. A subject scheme rejects invalid metadata at validation time and gives downstream filters something they can rely on.

  4. PDF generation

    Chosen

    Custom DITA-OT plugin with XSL-FO

    Rejected

    Default DITA-OT PDF transform

    Why: Default PDF output is procurement-unreadable. A branded plugin produced the same 218-page output suitable for client distribution without manual reformatting cycles.

A note on these numbers.

The figures in the artifact ledger are direct counts from the deliverables shipped on this engagement — not ROI projections or aggregated averages. Outcome percentages referenced anywhere on this site reflect industry benchmarks published by OASIS, Gartner, and CIDM for organizations that achieve 40%+ content reuse with structured metadata. Your actual results depend on content volume, language count, update frequency, and current toolchain maturity. Every engagement begins by measuring your baseline so projections are defensible.

Sample Content Assessment

Submit a 20-page sample. We'll return conversion feasibility, content recovery rate, and engineering effort within two business days. The analysis is the basis for any further engagement, with no obligation to proceed.

Submit a sample →