HomeBlogCitogenesis – briefly explained

Citogenesis – briefly explained

The odyssey from Algorizmi to algorithm reveals: citogenesis is not an original phenomenon of the internet and AI age. It has existed for more than 150 years. But what exactly lies behind the term? An overview with further references.

Citation loop of misinformation

The term citogenesis, borrowed from English, refers to a cycle in which false or unverified information appears to become true through repeated citation. The comic xkcd coined the term in the early 2000s. An incorrect claim is taken from an unsubstantiated source, cited again, and quasi-legitimized through growing circulation. With every additional reference, its apparent credibility increases—even though the information was never correct.

Citogenesis - Kreislaufbild Wikipedia
Image source: https://en.wikipedia.org/wiki/Wikipedia:List_of_citogenesis_incidents

Self-reinforcement through AI

In the age of AI, citogenesis gains particular relevance: language models, automated research tools, and generative systems can inadvertently amplify such errors, because widely circulating misinformation often serves as training data. Reproducing information without source checking creates a self-reinforcing loop that can spread far faster than in the past.

Once misinformation is firmly embedded in AI models, removing it is usually very difficult. Large, deep neural networks absorb knowledge from training data into complex, not easily interpretable mathematical objects (model parameters). The original information is no longer stored as a single “data record,” but as a pattern that affects the entire model.

Removing misinformation from AI models—especially those trained on extremely large and widely used datasets—is therefore unlikely. A vicious circle that is hard to break, because eliminating false data requires major investment but yields only a hard-to-grasp return on investment: better data. This makes transparent source attribution, a verifiable data basis, and robust fact-checking mechanisms ever more important to safeguard the integrity of digital knowledge spaces.

Example: Al-Khwarizmi and “Algorithums”

The project “The Odyssey from Algorizmi to Algorithm” by van-Helsing.ai illustrates how, over a citogenesis spanning more than 150 years, an unsubstantiated conjecture could become a perceived truth. The starting point was a hypothesis explicitly labeled as such by Joseph Reinaud in 1849. He believed that Al-Khwarizmi gave his name to the word “algorithm.” Without a single primary source, the mere conjecture was later elevated to fact. The decisive drivers were above all German scholars who claimed the thesis had been proven. From then on, the narrative was propagated—without evidence—by encyclopedias in every language. By now there are millions of texts repeating the narrative, without anyone seriously checking whether it was ever correct.

The full story as a PDF: van-Helsing.ai – The Odyssey from Algorizmi to Algorithm (166 pages, as of December 2025)

Breaking citogenesis with algorithms

Why is this a problem at all? Why is it relevant? The content of AI models is increasingly diluted by ever more frequent citogenesis: they interpret the frequency of a statement as an indicator of its truth. Countless cross-links add to this effect, because they also signal “authenticity” to an AI. That is exactly what makes it so tricky: the problem is rooted in the mismatch between verifiability and citation frequency. Repetition does not turn misinformation into truth. Rather, it turns it into an example of globally misleading information. The impact should not be underestimated, because algorithms are increasingly becoming a substitute for religion—a worldview.

AI can help uncover citogenesis in various ways, including through citation analysis. It helps determine who cited whom, and when. Based on that, citations can be checked for correctness.

Zitationsanalyse vs. Citogenesis
Image source: https://de.wikipedia.org/wiki/Zitationsanalyse

But AI also helps in other ways—without citation analysis: for example, by searching the internet for contradictions relating to a particular claim. AI also makes it possible to find, translate, and interpret sometimes very old original documents in vast online archives. Added to this is the interplay between community encyclopedias like Wikipedia and AI encyclopedias like Grokipedia: their complementary setup makes it possible to validate potential examples of citogenesis by cross-checking them on the other portal.

Algorithm Etymology

  • Download the study overview here
  • Download the whole study here

 

Yuval Noah Harari describes “algorithm” as one of the central concepts of our time. In

In lexicons, encyclopedias, and specialist literature, the explanation still dominates that the word algorithm arose

Thesis A examines whether the RAE’s (Real Academia Española) alternative derivation can be reconstructed -

Thesis B tests whether the RAE idea from Thesis A can be found in medieval

Thesis C examines when the now-dominant al-Ḫwārizmī narrative emerged. What is established is that it

At the end of the analysis, an overall conclusion is drawn: all three theses—(A the