DITA Engineering & Architecture

Out-of-the-box DITA is rarely enough. We engineer the semantic model — specialization, constraints, and SubjectScheme — that enforces consistency across teams of 50+ writers and feeds automation, search, and AI.

The Backbone of Intelligent Content

Standard DITA is powerful, but “out-of-the-box” is rarely enough for enterprise requirements.
We engineer the standard to work for you.

Why Engineering Matters

Implementing DITA without engineering the semantic model is like building a house without a blueprint. You might have walls and a roof, but the plumbing won't connect. Effective DITA engineering bridges the gap between raw XML standards and your specific business logic. It ensures that your content is not just “structured” but semantically meaningful, ready for automation, faceted search, and dynamic delivery. At Extense, our information architects don't just know the tags; we know how to constrain and specialize them to enforce consistency across teams of 50+ writers.

Core Competencies

DITA 1.3 & 2.0 Standards
RNG & DTD Constraints
Schematron Validation
SubjectScheme Taxonomy

Our Engineering Workflow

Every engagement runs through five categorical steps — from a semantic audit of your existing content to a packaged, CCMS-ready framework. Each stage produces a tested deliverable the next one builds on, so the model is engineered once rather than patched after writers hit its limits.

01
Analysis

Audit existing content for semantic patterns.
→
02
Modeling

Define topic types and domain requirements.
→
03
Development

Code the DTD/RNG schemas and plugins.
→
04
Validation

Implement Schematron rules for QA.
→
05
Deployment

Package into CCMS-ready frameworks.

Engineering the Standard

Three techniques turn raw DITA into a model that fits your product — and your writers.

Specialization

We create custom DITA specializations. If you are documenting APIs, we build an <api-operation> topic type. If you are in pharma, we build a <clinical-protocol> shell.
Constraints

Too many choices confuse writers. We use DITA constraints to remove unused elements (like <longquoteref>), simplifying the authoring interface and reducing training time.
SubjectScheme Maps

We implement robust metadata taxonomies using SubjectScheme. This allows you to control the values allowed in attributes like @audience or @platform, ensuring consistent filtering.

Case Study: Semiconductor Fabrication

Scenario

A client needed to document register maps for 500+ chips. Manual entry was error-prone.

Solution

We developed a custom DITA specialization for register data and wrote an XSLT transformation to auto-generate the DITA topics directly from the engineering IP-XACT files.

Result

100%

accuracy

Zero manual authoring time for reference data.

Legacy Content Conversion to DITA

Migrate your existing documentation — Word, FrameMaker, HTML, InDesign,
or proprietary XML — into clean, structured DITA topics.

Source Analysis & Mapping

We audit your legacy corpus to identify structural patterns, implicit topic boundaries, and reuse candidates. Every paragraph style, heading level, and inline convention is mapped to the appropriate DITA element before a single file is converted.
Automated Conversion Pipelines

Using industry-proven migration tools — Stilo Migrate, custom XSLT/Python scripts, and pandoc-based workflows — we automate the bulk conversion. Manual intervention is reserved for edge cases, not the rule. Typical throughput: thousands of pages per sprint.
Post-Conversion QA

Automated Schematron validation, side-by-side visual diff against the original, and metadata completeness checks. We don't hand off until every topic passes structural and content-fidelity gates.

Deduplication at Conversion Time

During conversion, we analyze every paragraph and span across your entire document collection using structured content analysis. Exact-match and near-match duplicates are identified, consolidated into reusable warehouse topics with conref pointers, and tracked in a deduplication report — saving you from discovering redundancy months later. Typical result: 15–30% of content identified as duplicate and consolidated in the first pass.

Information Architecture for DITA

Structure that scales. We design the IA layer that turns a collection of topics
into a navigable, filterable, reusable content system.

What We Define

Topic typing rules — when to use concept vs. task vs. reference vs. troubleshooting
Map hierarchy design — bookmap, submap, and relationship table structures for your deliverables
Key architecture — keydef maps, key scopes, and variable resolution strategies for multi-product content
Reuse strategy — conref, conkeyref, and content-reference library patterns that scale without fragility
Metadata & filtering model — @audience, @platform, @product, @rev attributes plus SubjectScheme-controlled values
Naming & folder conventions — file naming, directory structure, and ID strategies that work across CCMS, Git, and CI pipelines

Governance Deliverables

Every IA engagement produces a DITA Style Guide — a living document that codifies topic templates, element usage rules, metadata requirements, and writing patterns. This becomes your team's single source of truth for content standards.

AI-Ready Content Engineering

Your DITA content is already structured. We make it machine-intelligent — optimized for retrieval, embedding, and LLM grounding.

RAG-Optimized Chunking

We size and structure topics for optimal vector embedding — right-sized for token limits, self-contained for retrieval accuracy. Short descriptions become embedding-friendly summaries. Metadata becomes filter context.
Semantic Labeling

Every topic carries machine-readable labels — topic type, product version, audience, and domain context. When an LLM retrieves a chunk, it knows what kind of content it's citing and can attribute the source precisely.
JSON-LD & Knowledge Graph Mapping

We map DITA metadata and SubjectScheme taxonomies to Schema.org types and RDF triples. Your documentation becomes a queryable knowledge graph — not just a file system of XML documents.

From Documentation to Data Asset

Well-engineered DITA is 70% of what you need for an AI content pipeline. The topic architecture provides natural chunk boundaries, metadata provides filter dimensions, and short descriptions provide ready-made summaries. We close the remaining 30% with transform plugins, embedding pipelines, and delivery API integration.

Delivery & Hosting

Content that's structured but not delivered is content that doesn't exist.
We build the last mile — from DITA source to live, accessible output.

Static Site Publishing

DITA-OT builds HTML5 output deployed to AWS S3 + CloudFront, Azure Blob, Netlify, or GitHub Pages. CI/CD pipelines trigger rebuilds on every commit — your documentation site is always current.
Headless Content APIs

Transform DITA to JSON and serve through REST or GraphQL endpoints. Applications, chatbots, and embedded help panels pull exactly the content they need at runtime — no monolithic help portal required.
PDF & Print

Branded PDF output via XSL-FO or CSS Paged Media (Prince XML, Antenna House). We build PDF plugins that match your corporate identity — cover pages, headers, footers, legal boilerplate — all templated and automated.
Portal & CMS Integration

Publish DITA output into Drupal, WordPress, Salesforce Knowledge, or SharePoint. We build the connector layer that maps topics to CMS content types, preserves navigation hierarchy, and handles incremental updates.
Context-Sensitive Help

Map topics to application screens using resource IDs. We configure the delivery layer so your software's help button retrieves the right topic for the current context — in-app panels, modals, or external links.
Hosted Documentation Portals

Fully managed documentation hosting with search, versioning, access control, and analytics. We set up and maintain the infrastructure so your team focuses on content, not servers.

DITA-OT Services

The DITA Open Toolkit is the engine behind every output. We keep it running, extend it, and upgrade it.

DITA-OT Upgrades

Migrating from DITA-OT 2.x or 3.x to 4.x? We handle the upgrade — testing your existing plugins for compatibility, updating deprecated Ant targets, resolving XSLT conflicts, and validating output parity against your current builds.
Custom Plugin Development

We build DITA-OT plugins for branded HTML5 themes, custom PDF layouts, JSON/YAML output, SCORM packaging, Salesforce Knowledge export, and any transform target your pipeline requires. Delivered with unit tests and documentation.
CI/CD Pipeline Integration

DITA-OT wrapped in Docker containers, triggered by GitHub Actions, Jenkins, GitLab CI, or Azure DevOps. We build the pipeline that validates, transforms, and deploys on every merge — with build failure notifications and output artifact archiving.
Performance Optimization

Slow builds with large document sets? We profile your DITA-OT pipeline, optimize XSLT transforms, configure parallel processing, and implement incremental builds so only changed topics are re-rendered.
Troubleshooting & Support

Broken builds, rendering inconsistencies, plugin conflicts, and encoding issues. We diagnose and fix DITA-OT problems — and document the root cause so your team can avoid repeat issues.
DITA-OT Managed Service

Ongoing maintenance contract: we monitor your build pipelines, apply DITA-OT patch releases, update plugins for compatibility, and provide a dedicated support channel for your publishing team. SLA-backed response times.

Sample Content Assessment

Send us a sample of your DITA source — or your current docs if you're pre-DITA. We'll assess your information model, specialization opportunities, and DITA-OT pipeline, and return a concrete engineering plan. No commitment required.

Submit a sample →

DITA Engineering & Architecture

The Backbone of Intelligent Content

Why Engineering Matters

Our Engineering Workflow

Analysis

Modeling

Development

Validation

Deployment

Engineering the Standard

Specialization

Constraints

SubjectScheme Maps

Case Study: Semiconductor Fabrication

Legacy Content Conversion to DITA

Source Analysis & Mapping

Automated Conversion Pipelines

Post-Conversion QA

Deduplication at Conversion Time

Information Architecture for DITA

What We Define

AI-Ready Content Engineering

RAG-Optimized Chunking

Semantic Labeling

JSON-LD & Knowledge Graph Mapping

From Documentation to Data Asset

Delivery & Hosting

Static Site Publishing

Headless Content APIs

PDF & Print

Portal & CMS Integration

Context-Sensitive Help

Hosted Documentation Portals

DITA-OT Services

DITA-OT Upgrades

Custom Plugin Development

CI/CD Pipeline Integration

Performance Optimization

Troubleshooting & Support

DITA-OT Managed Service

Related Services

Publishing Engineering

XML Engineering

Structured Content Strategy

Sample Content Assessment