DITA Engineering & Architecture
Out-of-the-box DITA is rarely enough. We engineer the semantic model — specialization, constraints, and SubjectScheme — that enforces consistency across teams of 50+ writers and feeds automation, search, and AI.
The Backbone of Intelligent Content
Standard DITA is powerful, but “out-of-the-box” is rarely enough for enterprise requirements.
We engineer the standard to work for you.
Why Engineering Matters
Implementing DITA without engineering the semantic model is like building a house without a blueprint. You might have walls and a roof, but the plumbing won't connect. Effective DITA engineering bridges the gap between raw XML standards and your specific business logic. It ensures that your content is not just “structured” but semantically meaningful, ready for automation, faceted search, and dynamic delivery. At Extense, our information architects don't just know the tags; we know how to constrain and specialize them to enforce consistency across teams of 50+ writers.
Core Competencies
- DITA 1.3 & 2.0 Standards
- RNG & DTD Constraints
- Schematron Validation
- SubjectScheme Taxonomy
Our Engineering Workflow
Every engagement runs through five categorical steps — from a semantic audit of your existing content to a packaged, CCMS-ready framework. Each stage produces a tested deliverable the next one builds on, so the model is engineered once rather than patched after writers hit its limits.
- 01
Analysis
Audit existing content for semantic patterns.
- 02
Modeling
Define topic types and domain requirements.
- 03
Development
Code the DTD/RNG schemas and plugins.
- 04
Validation
Implement Schematron rules for QA.
- 05
Deployment
Package into CCMS-ready frameworks.
Engineering the Standard
Three techniques turn raw DITA into a model that fits your product — and your writers.
-
Specialization
We create custom DITA specializations. If you are documenting APIs, we build an
<api-operation>topic type. If you are in pharma, we build a<clinical-protocol>shell. -
Constraints
Too many choices confuse writers. We use DITA constraints to remove unused elements (like
<longquoteref>), simplifying the authoring interface and reducing training time. -
SubjectScheme Maps
We implement robust metadata taxonomies using SubjectScheme. This allows you to control the values allowed in attributes like
@audienceor@platform, ensuring consistent filtering.
Case Study: Semiconductor Fabrication
Scenario
A client needed to document register maps for 500+ chips. Manual entry was error-prone.
Solution
We developed a custom DITA specialization for register data and wrote an XSLT transformation to auto-generate the DITA topics directly from the engineering IP-XACT files.
Result
100%
accuracy
Zero manual authoring time for reference data.
Legacy Content Conversion to DITA
Migrate your existing documentation — Word, FrameMaker, HTML, InDesign,
or proprietary XML — into clean, structured DITA topics.
-
Source Analysis & Mapping
We audit your legacy corpus to identify structural patterns, implicit topic boundaries, and reuse candidates. Every paragraph style, heading level, and inline convention is mapped to the appropriate DITA element before a single file is converted.
-
Automated Conversion Pipelines
Using industry-proven migration tools — Stilo Migrate, custom XSLT/Python scripts, and pandoc-based workflows — we automate the bulk conversion. Manual intervention is reserved for edge cases, not the rule. Typical throughput: thousands of pages per sprint.
-
Post-Conversion QA
Automated Schematron validation, side-by-side visual diff against the original, and metadata completeness checks. We don't hand off until every topic passes structural and content-fidelity gates.
Deduplication at Conversion Time
During conversion, we analyze every paragraph and span across your entire document collection using structured content analysis. Exact-match and near-match duplicates are identified, consolidated into reusable warehouse topics with conref pointers, and tracked in a deduplication report — saving you from discovering redundancy months later. Typical result: 15–30% of content identified as duplicate and consolidated in the first pass.
Information Architecture for DITA
Structure that scales. We design the IA layer that turns a collection of topics
into a navigable, filterable, reusable content system.
What We Define
- Topic typing rules — when to use concept vs. task vs. reference vs. troubleshooting
- Map hierarchy design — bookmap, submap, and relationship table structures for your deliverables
- Key architecture — keydef maps, key scopes, and variable resolution strategies for multi-product content
- Reuse strategy — conref, conkeyref, and content-reference library patterns that scale without fragility
- Metadata & filtering model —
@audience,@platform,@product,@revattributes plus SubjectScheme-controlled values - Naming & folder conventions — file naming, directory structure, and ID strategies that work across CCMS, Git, and CI pipelines
Governance Deliverables
Every IA engagement produces a DITA Style Guide — a living document that codifies topic templates, element usage rules, metadata requirements, and writing patterns. This becomes your team's single source of truth for content standards.
AI-Ready Content Engineering
Your DITA content is already structured. We make it machine-intelligent — optimized for retrieval, embedding, and LLM grounding.
-
RAG-Optimized Chunking
We size and structure topics for optimal vector embedding — right-sized for token limits, self-contained for retrieval accuracy. Short descriptions become embedding-friendly summaries. Metadata becomes filter context.
-
Semantic Labeling
Every topic carries machine-readable labels — topic type, product version, audience, and domain context. When an LLM retrieves a chunk, it knows what kind of content it's citing and can attribute the source precisely.
-
JSON-LD & Knowledge Graph Mapping
We map DITA metadata and SubjectScheme taxonomies to Schema.org types and RDF triples. Your documentation becomes a queryable knowledge graph — not just a file system of XML documents.
From Documentation to Data Asset
Well-engineered DITA is 70% of what you need for an AI content pipeline. The topic architecture provides natural chunk boundaries, metadata provides filter dimensions, and short descriptions provide ready-made summaries. We close the remaining 30% with transform plugins, embedding pipelines, and delivery API integration.
Delivery & Hosting
Content that's structured but not delivered is content that doesn't exist.
We build the last mile — from DITA source to live, accessible output.
-
Static Site Publishing
DITA-OT builds HTML5 output deployed to AWS S3 + CloudFront, Azure Blob, Netlify, or GitHub Pages. CI/CD pipelines trigger rebuilds on every commit — your documentation site is always current.
-
Headless Content APIs
Transform DITA to JSON and serve through REST or GraphQL endpoints. Applications, chatbots, and embedded help panels pull exactly the content they need at runtime — no monolithic help portal required.
-
PDF & Print
Branded PDF output via XSL-FO or CSS Paged Media (Prince XML, Antenna House). We build PDF plugins that match your corporate identity — cover pages, headers, footers, legal boilerplate — all templated and automated.
-
Portal & CMS Integration
Publish DITA output into Drupal, WordPress, Salesforce Knowledge, or SharePoint. We build the connector layer that maps topics to CMS content types, preserves navigation hierarchy, and handles incremental updates.
-
Context-Sensitive Help
Map topics to application screens using resource IDs. We configure the delivery layer so your software's help button retrieves the right topic for the current context — in-app panels, modals, or external links.
-
Hosted Documentation Portals
Fully managed documentation hosting with search, versioning, access control, and analytics. We set up and maintain the infrastructure so your team focuses on content, not servers.
DITA-OT Services
The DITA Open Toolkit is the engine behind every output. We keep it running, extend it, and upgrade it.
-
DITA-OT Upgrades
Migrating from DITA-OT 2.x or 3.x to 4.x? We handle the upgrade — testing your existing plugins for compatibility, updating deprecated Ant targets, resolving XSLT conflicts, and validating output parity against your current builds.
-
Custom Plugin Development
We build DITA-OT plugins for branded HTML5 themes, custom PDF layouts, JSON/YAML output, SCORM packaging, Salesforce Knowledge export, and any transform target your pipeline requires. Delivered with unit tests and documentation.
-
CI/CD Pipeline Integration
DITA-OT wrapped in Docker containers, triggered by GitHub Actions, Jenkins, GitLab CI, or Azure DevOps. We build the pipeline that validates, transforms, and deploys on every merge — with build failure notifications and output artifact archiving.
-
Performance Optimization
Slow builds with large document sets? We profile your DITA-OT pipeline, optimize XSLT transforms, configure parallel processing, and implement incremental builds so only changed topics are re-rendered.
-
Troubleshooting & Support
Broken builds, rendering inconsistencies, plugin conflicts, and encoding issues. We diagnose and fix DITA-OT problems — and document the root cause so your team can avoid repeat issues.
-
DITA-OT Managed Service
Ongoing maintenance contract: we monitor your build pipelines, apply DITA-OT patch releases, update plugins for compatibility, and provide a dedicated support channel for your publishing team. SLA-backed response times.
Sample Content Assessment
Send us a sample of your DITA source — or your current docs if you're pre-DITA. We'll assess your information model, specialization opportunities, and DITA-OT pipeline, and return a concrete engineering plan. No commitment required.
Submit a sample →