Knowledge bases for prototyping

Common characteristics of our sample knowledge bases (KBs)

Our eight “sandbox” knowledge bases, described in more detail and linked below, have the following common characteristics:

  1. Contain information (e.g. grocery shopping, community and family facts and history) that is “timeless” and generally understood by almost any target audience
  2. Contain specific information (“ground-truth”) that is already known to (and likely written by) us, so we can better formulate prompts and queries to the model and understand and evaluate its responses
  3. Multimodal, which allows us to challenge the technical capabilities of various models

In addition, the first two (“Shopping for Groceries” and “Cleaning the Garage”) are simple and small, which makes them perfect for prototyping.

KB #1: “Shopping for groceries”

Facts, features, flaws
  • Structured document in DITA/xml; illustrates basic DITA concepts and features
  • Super-small project size (7 topics that include concept, task, reference material)
  • Content lacks interest, relevance
Sample published content

“Shopping for Groceries” (WebHelp Responsive)
“Shopping for Groceries” (PDF)

KB #2: “Cleaning the garage”

Facts, features, flaws
  • Structured document in DITA/xml; illustrates more sophisticated DITA features like multiple ditamaps and filtering using a ditaval file
  • Small project size (20 topics)
  • Content lacks interest, relevance
Sample published content

“Cleaning the Garage” (WebHelp Responsive)
“Cleaning the Garage” (PDF)

KB #3: “DITAinformationcenter”

Facts, features, flaws
  • DITA-based structured information project of moderate complexity containing approximately 350 source files
  • From 2006-2011, contained “ground-truth” information about DITA and the DITA Open Toolkit
  • Now (2025) the content is outdated and sometimes misleading
  • We are no longer SMEs in this space
Sample published content

“DITAinformationcenter” (PDF)

KB #4: “Computer history”

Facts, features, flaws
  • The source files are are in multiple formats: Some are in DITA/xml and others were written in HTML and originally published as website posts
  • The narrative content is based mostly on our volunteer experiences in volunteering at the Computer History Museum
  • Adding research-based information to the knowledge collection would be difficult and impractical to do
Sample published content

“My high-tech adventure” (PDF)
“Computer games from the past” (PDF)
“Anker-Werke banking machine” (PDF)

KB #5: “Astronomy images”

Facts, features, flaws
  • DITA-based structured information project published to WebHelp Responsive
  • Collaborative project (human/AI)
  • Images displayed by category (e.g., galaxy, nebula, planet) and included with each image is a short description
  • Object descriptions are relatively “standard” and “timeless”
  • Indexed to help people find particular images
  • Ideal way to share a hobby-based collection of images with friends and family
Sample published content

“Astronomy images” (WebHelp Responsive)

KB #6: “Community information project” (El Dorado Hills, California)

Facts, features, flaws
  • Written in structured DITA/xml; published to WebHelp Responsive
  • Collaborative project (human/AI)
  • Illustrates how to update out-of-date information with the help of AI assistants and agents
  • Examples files show how to set up a project and prepare for future automation
  • Includes a style guide and glossary
  • Model for similar group/community projects
  • Major issues include: (1) challenge of morphing a book into a web-based topic collection, (2) challenge of updating inherently volatile information after 25 years of inactivity, (3) lack of a current human owner or verifier
Sample published content
WebHelp Responsive site containing information available to general audiences:

“EDH Community Information (2026-01-13)” (website external)

WebHelp Responsive site containing the external information plus “behind the scenes” information relevant only to the creation and collaboration team:

“EDH Community Information (2026-01-13)” (project internal)

PDF file of the original (2003) Ed Dorado Hills Handbook:

“EDH Handbook (2003)” (PDF)

Knowledge bases #7 and #8: Family history, genealogy

These two multimodal knowledge bases contain “ground-truth” information about ourselves and our ancestors; for example:

  • Typical genealogical facts, charts and diagrams from family trees
  • Photocopies of original records and record indexes
  • Family history books, papers and posts
  • Photographs of individuals and events
Facts, features, flaws
  • The source files are not well integrated (e.g., some are in genealogy apps, others are narratives written by a number of people, and many are images in various formats)
  • Dozens or perhaps hundreds of collaborators have contributed to the collections, and some of the content files contain factual contradictions
  • We have had the most success in asking our AI assistants for contextual information to supplement our current collections
  • In 2026 we’re hoping to define and create a WebHelp Responsive genealogy and family history collection
Sample published content

“Pedigree chart” (PDF)
“Family history book” (PDF)
“Paper: compiled lineage” (PDF)
“Web post: birthday tribute” (PDF) (AI-enhanced)
“Web page: family page” (PDF)

Local knowledge base creation, curation, and transformation for AI/RAG processing