CASE 01 / 03 2024 Consulting practice

Enterprise DITA Training Curriculum

From a blank repository to a 76-topic, 10-part progressive learning program — authored, enriched, and published entirely in DITA.

Duration: 16 weeks
Team: 1 lead architect · 2 content engineers · 1 publishing engineer
Engagement: Project-based, fixed scope
Status: Shipped · in production across three learner audiences

Challenge and approach

The challenge

A growing consulting practice needed a comprehensive, hands-on DITA training program that could serve three distinct audiences — technical writers, software engineers, and content strategists — with different learning paths, difficulty levels, and time commitments. Existing materials were scattered across Markdown files, slide decks, and tribal knowledge with no reuse, no metadata, and no structured publishing pipeline.

The approach

We designed a 10-part bookmap architecture covering the full content lifecycle: from 'Why Intelligent Content Matters' through authoring, information architecture, metadata enrichment, publishing automation, DITA-OT development, XSLT customization, and AI pipeline integration. Every topic was formally typed (concept, task, or reference), enriched with 10+ metadata fields, and validated against a controlled vocabulary subject scheme.

Artifact Ledger

76 DITA topics
10 learning parts
12 hands-on labs
218 PDF pages

Stack

Schema: DITA 1.3 · custom subject scheme
Authoring: Oxygen XML Editor · Schematron rules
Tooling: Python 3.12 (metadata enrichment)
Publishing: DITA-OT 4.2 · Apache FOP 2.9 · XSL-FO
Build: Containerized shell scripts · single-command HTML5 + PDF

What we delivered.

Content architecture

Master bookmap with 10 submaps, a shared resource library, and four appendices. Three role-based learning paths with phase checkpoints and time estimates — 25 to 50 hours depending on role.

Before / after samples

Five unstructured XML files transformed into six fully typed DITA topics — demonstrating content splitting, semantic markup, and metadata enrichment at every step of the conversion.

Assessment materials

Eighty interview questions with reference answers organized by skill level. A structured learner study guide with five phases, prerequisites, and a vocabulary glossary for onboarding.

Curriculum coverage

01 Why Intelligent Content
02 Content Preparation
03 Authoring
04 Information Architecture
05 Metadata Enrichment
06 Publishing Automation
07 DITA-OT Development
08 XSLT Customization
09 AI Pipeline Integration
10 Benefits & Impact Measurement

Decisions and trade-offs.

The choices that shaped the engagement, recorded with the option taken and what was rejected. The reasoning matters more than the outcome.

Topic typing

Chosen

Formal DITA concept / task / reference

Rejected

Generic <article> / <section> structure

Why: Formal types let authors and tooling distinguish 'what is X' from 'how to do X' — essential for learner-path routing and downstream AI retrieval, but also for human navigability across a 76-topic corpus.
Bookmap organization

Chosen

10 thematic parts (progressive curriculum)

Rejected

Flat topic library indexed by tags

Why: Curriculum needs sequence dependency and prerequisite expression; a flat library can't carry phase-by-phase ordering without manual cross-references that drift the moment the content is edited.
Metadata strategy

Chosen

Controlled-vocabulary subject scheme

Rejected

Free-text tags applied at author discretion

Why: Ad-hoc tags accrete inconsistent values within months. A subject scheme rejects invalid metadata at validation time and gives downstream filters something they can rely on.
PDF generation

Chosen

Custom DITA-OT plugin with XSL-FO

Rejected

Default DITA-OT PDF transform

Why: Default PDF output is procurement-unreadable. A branded plugin produced the same 218-page output suitable for client distribution without manual reformatting cycles.

A note on these numbers.

The figures in the artifact ledger are direct counts from the deliverables shipped on this engagement — not ROI projections or aggregated averages. Outcome percentages referenced anywhere on this site reflect industry benchmarks published by OASIS, Gartner, and CIDM for organizations that achieve 40%+ content reuse with structured metadata. Your actual results depend on content volume, language count, update frequency, and current toolchain maturity. Every engagement begins by measuring your baseline so projections are defensible.

Sample Content Assessment

Submit a 20-page sample. We'll return conversion feasibility, content recovery rate, and engineering effort within two business days. The analysis is the basis for any further engagement, with no obligation to proceed.

Submit a sample →