All posts by SysAdmin

Updating community information: Establishing an AI-powered, semi-automated workflow

Recently we’ve been heavily involved in preparing one of our prototype knowledge bases (#4: El Dorado Hills CA community information) for a major reorganization and content update. The original community information content was created in 2003, and we are updating it in 2025 for the first time. 

Please note that this is a prototype, created solely for educational purposes (for example, to explain and show how to create knowledge bases for RAG processing). We no longer have much personal knowledge of the El Dorado Hills community, but rather are relying on information that is available to us electronically, and will be mostly delivered to us (we hope) by Perplexity in response to a focused prompt.

If you need more background information…

Here’s a link to a recent post with more detailed overview of this project:

Using DITA/xml, GenAI tools and RAG processing to establish a community information collection

Here’s a link to the alpha version of our El Dorado Hills WebHelp Responsive community information collection:

“El Dorado Hills community information” (WebHelp Responsive by oXygen, opens new window)

Major sections of our EDH project

Major sections of the EDH WebHelp Responsive project
Major sections of the EDH WebHelp Responsive project

Our alpha prototype is the output from DITA/xml input, created using oXygen/Positron, and hosted on this website.

The README section contains overview information about the EDH project, and it will need a human edit. The “Social history” and “Natural environment” sections have fairly static information, and shouldn’t need much updating.

The part of the information collection that is most in need of a content update is the “Current community information” section, which we will work on first.

Topics we’re updating first

The “Current community information” section contains the following kinds of topics:

"Current community" table of contents
“Current community” table of contents

A lot of the update work will fall to us (Anna and Dick), as human authors and editors, but we want to make use of as many AI assistants and agents as we can. We also want to start to establish a semi-automated workflow now with an eye to a more fully automated process in the future.

Example contents and workflow (“Location” topic)

Our original EDH document was created in unstructured FrameMaker, and a content update such as the one we are doing now would have been much more difficult. However, our current prototype source is in DITA/xml, which is a topic-based framework, and makes our update tasks much easier.

We decided to put all the text, notes and images we need to update a given topic in one place, and when the work is complete we’ll delete everything but the actual 2025 update (and perhaps also the Spanish translation).

As an example, here is the temporary “table of contents” for the “Location” topic (part of the “Current community information” section, see the prior image), which deals with the geography of El Dorado Hills and environs.

Subtopics currently assigned to "Location"
Subtopics currently assigned to “Location”

During the entire update process, all of these files will be available to us and our primary AI assistants, Perplexity and oXygen/Positron, but when the 2025 content update is complete, the other files will simply be archived or removed.

One benefit of using DITA/xml is that each of the topics shown above can be processed and published separately, and shared among the team members as we work. In our case, the “team” includes Dick, Anna, Perplexity, Positron, and any other AI assistants and agents we might decide to use for a given task. So the “knowledge base” we’re using for all generation and processing tasks (RAG and otherwise) is expanded and made more flexible. And the “ground truth” on which we anchor our published results is right under our noses and instantly available!

So far, we’ve cleaned up the 2003 “Location” file, given it to Perplexity to update, and edited the results. Then we gave it back to Perplexity to put into Spanish. We’ve also recorded some notes on content, process, and lessons learned.

Content, process and lessons learned

DITA/xml files for the “Fast facts” section

Here are the DITA/xml input files for “Fast facts” section.

DITA/xml files
DITA/xml files
Original 2003 “Location” topic, lightly edited

Here is part of the original 2003 “Location” topic.

"Location" topic, 2003
“Location” topic, 2003
Perplexity-generated 2025 draft, based on the 2003 content and additional research

Here is a draft 2025 version of the same file, which was generated by Perplexity and edited by Anna.

Draft of the updated 2025 version
Draft of the updated 2025 version
Map problems and solutions

I was hoping for a new, AI-generated, color map using the original black-and-white map as a model, but Perplexity was unable to generate a satisfactory one, not even with multiple tries.

Perplexity's map, third try
Perplexity’s map, third try

As I understand it, Perplexity (and many other AI assistants) don’t yet have the integrated visual analysis tools needed for interpreting, validating, or generating detailed geographic maps. However, Perplexity pointed me to Mapcarta, which provided me with the color map that I annotated and included in the 2025 draft, above.

Spanish translation

Here is the first draft of a Spanish translation, generated by Perplexity by translating the updated (2025) content directly. It is an early experiment. For the next version, I’ll try multiple AI assistants and other translation providers, and will do a comparison using the involvement of (human) Spanish language experts.

Draft Spanish translation
Draft Spanish translation

For more information

Here is a link to the PDF of the entire “Location” topic (I’m calling it the “beta1” version) as it exists today:

Location topic Beta1 (PDF file)