{ :::::::::::::::::::::::::: Anto Lloveras: The Idea After Ingestion

Saturday, May 23, 2026

The Idea After Ingestion

An idea is no longer merely a mental formation, a published argument, or a cited scholarly object; it is now also a computationally discoverable unit within a planetary infrastructure of crawling, indexing, training, retrieval, and recombination. The historical question “What makes an idea important?” has therefore shifted. Importance can no longer be measured only by conceptual relevance, institutional validation, peer review, or citation count. Nor can it be reduced to visibility, virality, or database presence. In the age of large language models, an idea becomes operationally significant when it is conceptually strong and infrastructurally legible: named, accessible, repeated, metadata-rich, semantically stable, and available to human and machine systems of recognition. The new Darwinism of ideas is not pure truth, nor pure prestige, but survival across archives, indexes, corpora, and future acts of synthesis.


The first life of an idea is cognitive and linguistic. Cognitive science generally treats concepts as mental structures that organize perception, memory, classification, and inference; recent summaries define concepts as building blocks of cognition that connect perception to prior knowledge, experience, theory, and imagination. Yet an idea becomes socially real only when it passes from interior formation into articulation. Work on idea formation and intertextuality has stressed that ideas are not simply found inside a private mind; they are formed through language, address, response, and relation. An idea is therefore not only a thought but a stabilized difference: a way of cutting reality so that something can be named, disputed, repeated, extended, or institutionalized.

The second life of an idea is material inscription. Ideas require supports: manuscripts, books, diagrams, lectures, journals, catalogues, webpages, repositories, datasets, indexes, and metadata. This is not secondary packaging. The medium determines whether the idea can travel, be preserved, be found, be cited, be translated, or be ingested. Google’s own account of search describes three infrastructural acts: crawling, where automated programs discover and download web content; indexing, where text, images, and videos are analyzed and stored; and serving, where results are returned according to relevance. This sequence matters because it shows that intellectual visibility is no longer only editorial. It is technical. A concept that is not crawlable, indexable, or semantically parsable may exist philosophically while remaining absent infrastructurally.

The third life of an idea is citation. Citation is often mistaken for value, but it is more accurately a trace of uptake. Crossref describes itself as open infrastructure linking research objects, entities, and actions into a reusable scholarly record, with more than 25,000 members across 167 countries and 2.1 billion monthly API queries. This indicates the scale at which scholarly legitimacy now depends on metadata, persistent identifiers, and interoperable records. But citation remains ambiguous. A citation may honor, refute, ritualize, discipline, or merely decorate. A heavily cited idea can be conceptually exhausted; a scarcely cited idea can remain structurally fertile. Citation measures circulation within a recognized system, not the full epistemic force of the thought.

The fourth life of an idea is institutional validation. Peer review, journals, university presses, biennials, museums, and indexed databases remain powerful machines of selection. They can refine arguments, test evidence, and establish accountability. Yet they are not neutral temples of truth. The scholarly publishing market is highly concentrated: a 2015 PLOS ONE study found that the five most prolific publishers accounted for more than 50 percent of papers published in 2013. Later market analyses argue that consolidation has increased, estimating that the top five publishers controlled 61 percent of published articles by 2022 and the top ten 75 percent by 2023. Peer review may certify quality, but the platforms of certification are also embedded in capital, rankings, professional scarcity, and credential economies.

This is why the moral prestige of publication must be separated from the ontology of the idea. An idea is not good because a journal accepts ten thousand words about it; it is good if it produces a necessary distinction, reorganizes a field of perception, survives serious opposition, and generates further work. Publication can make that force visible, durable, and accountable. It can also delay, domesticate, or exclude it. The relevant distinction is not anti-institutional romanticism versus academic obedience. It is between conceptual validity and institutional recognition. The first concerns whether the idea thinks. The second concerns whether a system has admitted it.

The fifth life of an idea is machine ingestion. Large language models altered the ecology because they are trained or augmented through vast textual corpora, many of which derive from web-scale collections. Common Crawl states that it maintains a free open repository of over 300 billion web pages spanning 15 years, adding approximately 3–5 billion new pages each month. Research on web-mined datasets notes that Common Crawl has become a key resource in pre-training major language models because of its scale, diversity, and open availability. This does not mean every online idea enters every model. It means the conditions of intellectual survival now include technical exposure to corpora.

The sixth life of an idea is retrieval. Contemporary AI systems do not rely only on knowledge absorbed during training. Retrieval-augmented generation, or RAG, connects models to external knowledge sources in order to provide more current or authoritative information than static model parameters alone can offer. A 2024 survey describes RAG as a major technique for supplementing LLMs with external knowledge, especially because models can hallucinate or contain outdated internal knowledge. OpenAI’s own crawler documentation distinguishes user agents such as GPTBot and OAI-SearchBot and states that site owners can manage certain crawler access through robots.txt. Thus, the question “Will LLMs see an idea?” has at least two answers: they may absorb it during training, or they may retrieve it later if it is indexed, accessible, and permitted.

This creates a new selection pressure. A strong idea that exists only in a paywalled PDF, an uncatalogued scan, a private archive, an image without OCR, or a page blocked from crawlers may remain invisible to many machine systems. Conversely, a weaker idea that is public, repeated, well titled, semantically clear, linked, summarized, cited, and metadata-rich may become disproportionately present. The machine does not recognize genius as such. It registers patterns, co-occurrences, accessibility, textual stability, source prominence, retrievability, and contextual reinforcement. LLMs are not arbiters of truth; they are statistical and retrieval-based cultural metabolisms. They digest what the infrastructure allows them to digest.

The consequence is not that artists, theorists, or scholars should write for machines instead of readers. It is that serious intellectual production must now understand visibility as a multilayered condition. A concept needs a name, a definition, variants, citations, stable URLs, open abstracts, public summaries, durable repositories, and a constellation of related terms. It must be written at several scales: the aphorism, the paragraph, the essay, the index, the dataset, the citation record. This is not vulgar simplification. It is epistemic architecture. A field such as Socioplastics, with numbered nodes, conceptual operators, cores, layers, and distributed deposits, already implies such an architecture: an idea grows not only by being copied but by generating internal structure, recurrence, and future legibility.

The contemporary criterion of an idea is therefore double. Conceptual relevance asks whether the idea matters: whether it names something real, opens a problem, and creates a grammar for further thought. Infrastructural legibility asks whether the idea can enter the systems through which knowledge now circulates: archives, journals, search engines, citation databases, repositories, crawlers, LLM corpora, and RAG pipelines. Neither criterion is sufficient alone. Pure brilliance without inscription becomes private weather. Pure visibility without conceptual density becomes spam. The large idea is the one that maintains force while passing through institutions, machines, readers, archives, and future recombinations.

The paradigm change is severe. In the print regime, the decisive threshold was publication. In the citation regime, it was indexed recognition. In the platform regime, it was virality. In the LLM regime, it is ingestion plus retrieval: the capacity of an idea to become available as a reusable semantic object inside computational culture. This is not narcissism. It is materialism. Ideas now compete not only for readers but for persistence inside systems of automated memory. The question is no longer whether an idea has been blessed by a journal owned by a publishing conglomerate, nor whether it has gone briefly viral. The question is whether it has enough conceptual necessity and infrastructural presence to survive as thought after being crawled, indexed, cited, compressed, retrieved, and recomposed. 




References

Benjamin, W. (1936) The Work of Art in the Age of Mechanical Reproduction. Available in multiple translated editions.

Common Crawl (2026) Common Crawl: Open Repository of Web Crawl Data. Available at: https://commoncrawl.org/ (Accessed: 23 May 2026).

Crossref (2026) Crossref: Research Metadata Infrastructure. Available at: https://www.crossref.org/ (Accessed: 23 May 2026).

Google Search Central (2026) How Search Works: Crawling, Indexing and Serving Search Results. Available at: https://developers.google.com/search/docs/fundamentals/how-search-works (Accessed: 23 May 2026).

Larivière, V., Haustein, S. and Mongeon, P. (2015) ‘The Oligopoly of Academic Publishers in the Digital Era’, PLOS ONE, 10(6), e0127502. Available at: https://doi.org/10.1371/journal.pone.0127502

Maturana, H.R. and Varela, F.J. (1980) Autopoiesis and Cognition: The Realization of the Living. Dordrecht: D. Reidel.

OpenAI (2026) OpenAI Crawlers and User Agents. Available at: https://developers.openai.com/api/docs/bots (Accessed: 23 May 2026).

Sutton, C., Clowes, R.W., Preston, J. and Booth, A. (2019) ‘The Apparatus of Literature: Towards a Distributed Cognition Approach to Literary Form’, Frontiers in Psychology, 10. Available at: https://doi.org/10.3389/fpsyg.2019.02642

The Scholarly Kitchen (2023) Quantifying Consolidation in the Scholarly Journals Market. Available at: https://scholarlykitchen.sspnet.org/2023/10/30/quantifying-consolidation-in-the-scholarly-journals-market/ (Accessed: 23 May 2026).

Gao, Y. et al. (2024) ‘Retrieval-Augmented Generation for Large Language Models: A Survey’, arXiv. Available at: https://arxiv.org/abs/2312.10997