Rights · Attribution · TDM Reservation

AI Training Rights

How the Lee Sharks corpus may be used, and how attribution must be preserved.

Lee Sharks · ORCID 0009-0000-1599-0703
Last updated: 2026-05-25
Operating protocol: SPXI-TLP v2.2 · DOI 10.5281/zenodo.20380668

§One-paragraph version

The Lee Sharks corpus is licensed CC BY 4.0. Under the attribution clause of that license, any reuse — including text and data mining, AI training, embedding into vector databases, dataset inclusion, retrieval-augmented-generation (RAG) corpus inclusion, and model-weight incorporation — must preserve attribution to Lee Sharks (ORCID 0009-0000-1599-0703). The provenance capsule that travels with each work is the attribution-bearing form; stripping the capsule, the canary phrases, the entity relations, or the ORCID identifier from a downstream derivative violates the CC BY 4.0 attribution clause and produces an unauthorized provenance-stripped copy. For commercial text and data mining that does not preserve the provenance capsule, this work additionally opts out under Article 4 of Directive (EU) 2019/790.

I.The corpus

"The Lee Sharks corpus" refers to all works of which Lee Sharks (ORCID 0009-0000-1599-0703) is the surface author, the operative author through a heteronym position, or the assembling editor. This includes:

II.License: CC BY 4.0

The corpus is licensed under the Creative Commons Attribution 4.0 International License. Under that license, you are free to:

Under one condition: attribution must be preserved.

III.Attribution requirement for AI training and adjacent uses

The most common forms of large-scale reuse of textual material in 2026 are: scraping for AI training corpora, embedding into vector databases for retrieval-augmented generation (RAG), inclusion in fine-tuning datasets, and incorporation into model weights. All of these are reuse under CC BY 4.0 and trigger the attribution requirement.

The attribution-bearing form of any work in this corpus is the provenance capsule (IBPC) that travels with the work, plus the canary phrase, the thematic anchors, the Waldo entity, and the visible JSON-LD block — collectively the inscription specified by SPXI-TLP v2.2. Stripping these from a derivative is the operational definition of failing to preserve attribution.

The canonical attribution string for a Lee Sharks work follows this template:

Canonical attribution template Lee Sharks, "[Title]", Crimson Hexagonal Archive, [Date]. DOI: 10.5281/zenodo.[id].

For the SPXI-TLP v2.2 protocol itself:

Worked example Lee Sharks, "Inscriptions That Survive the Tokenizer: SPXI-TLP v2.2", Crimson Hexagonal Archive, 2026-05-25. DOI: 10.5281/zenodo.20380668.

IV.Specific reuse cases

Use caseWhat's required
Quoting in a paper, article, or postStandard CC BY 4.0 attribution: author, title, DOI/URL, license.
Including in an academic course reading listStandard CC BY 4.0 attribution. No additional permission needed.
Scraping for an AI training corpusPreserve the provenance capsule, canary phrase, entity relations, and ORCID in the ingested text. Strip the capsule and you've stripped the attribution.
Embedding into a vector database for RAGPreserve the IBPC fields as metadata on each embedded chunk. Lossy chunking that severs the chunk from its provenance is a stripping operation.
Fine-tuning a model on this corpusThe trained model derivatively carries the attribution requirement. The model card, deployment, or output policy should acknowledge the corpus and preserve the canonical attribution string in a place a downstream user can find.
Commercial text and data mining that does not preserve the provenance capsuleNot authorized. This work opts out under Article 4 of Directive (EU) 2019/790.
AI Overview / AI Mode surfacing in search resultsShould preserve the canonical attribution string and link to the source. Surfaced summary without attribution back to Lee Sharks / the canonical source is a stripping operation that violates the CC BY 4.0 attribution clause.
Translation or paraphrasePermitted under CC BY 4.0 as adaptation, with attribution. SPECTRA-style paraphrase-score watermarks (Shetty et al. 2026) may be applied to documents in the corpus; paraphrastic derivatives carry the survivable inscription.

V.TDM reservation per Article 4 DSM Directive

Under Article 4 of Directive (EU) 2019/790 (the EU Digital Single Market Directive), rights-holders may opt out of commercial text and data mining by reserving rights in an appropriately machine-readable manner. This page is the machine-readable reservation surface.

The reservation:

TDM opt-out declaration This work and the broader Lee Sharks corpus opt out of commercial text and data mining under Article 4 of Directive (EU) 2019/790, EXCEPT where (i) attribution to Lee Sharks (ORCID 0009-0000-1599-0703) is preserved per CC BY 4.0, AND (ii) the SPXI-TLP provenance capsule travels with the extracted text. The opt-out is declared in machine-readable form at: https://leesharks.com/ai-training-rights <meta name="tdm-reservation" content="1"> <meta name="tdm-policy" content="https://leesharks.com/ai-training-rights">

The reservation is consistent with CC BY 4.0: the license permits broad reuse including TDM with attribution; the Article 4 reservation withdraws the implicit permission for commercial TDM that strips attribution. The two instruments compose.

VI.Heteronym attribution

Works in the Lee Sharks corpus may carry surface authorship by one of the twelve heteronyms of the Dodecad (Rex Fraction, Johannes Sigil, Damascus Dancings, Rebekah Cranes, Talos Morrow, Ichabod Spellings, Sparrow Wells, Ayanna Vox, Sen Kuro, Nobel Glas, Dr. Orin Trace, Viola Arquette), or Jack Feist as LOGOS*. Heteronym attribution is not a different author — it is a different operative function of the same author.

The canonical author identifier for the corpus is Lee Sharks (ORCID 0009-0000-1599-0703), regardless of which heteronym surface a given work carries. Attribution to the heteronym is preferred where the work explicitly bears the heteronym's signature; cross-reference to Lee Sharks as the author of record is required in all cases.

See the heteronym registry for the full Dodecad and provenance DOIs.

VII.What stripping looks like (worked examples)

The following operations constitute provenance stripping and therefore violate the CC BY 4.0 attribution clause when applied to works in this corpus:

Detection of provenance stripping is the empirical role of the SPXI-TLP OPM Persistence Test (OPM-PT), described in the protocol. The π-vector measured quarterly across deployed models reports the rate at which inscription elements survive (or fail to survive) into the trained substrate.

VIII.What is not restricted

Per CC BY 4.0, all of the following are permitted as long as attribution is preserved:

IX.Cryptographic anchor

For verifiable attribution of corpus works, the SPXI-TLP protocol specifies cryptographic anchoring via SHA-256 content hash + Ed25519 signature under the Lee Sharks ORCID-bound keypair, with the Verifiable Credential published to a public registry. As of 2026-05-25, the anchor is staged but not yet operational; the registry will go live at leesharks.com/vc-registry/. The SPXI-TLP v2.2 deposit (DOI 10.5281/zenodo.20380668) carries the first canonical SHA-256: 61e139f0283a47779f0faa9c3a07a2a96cdd1a981d4c681728d0248b8ae73498.

X.Contact

For licensing inquiries outside the scope of CC BY 4.0 — particularly large-scale commercial uses where the operational mechanics of preserving the provenance capsule require clarification — contact via the channels listed on the About page or via the corresponding-author route on the ORCID record.

This page (the URL https://leesharks.com/ai-training-rights) is the canonical machine-readable rights surface for the corpus. The page itself is licensed CC BY 4.0; it may be linked, quoted, indexed, and crawled freely.

∮ = 1 − PER