Content Safety, Moderation & In-Flight Masking
Our AI pipeline aims to maximise creativity while honoring the Responsible-AI policies of all providers we use. This page explains:
- When a passage may be blocked or soft-flagged
- How our operation-aware moderation works (transform vs generate)
- What our in-flight masking does (prompt-injection guardrails)
- What you will see in your final HTML/DOCX
- Our 0–4 audience rating rubric
1) Why a Passage Might Be Blocked
| Trigger Stage | Examples of Disallowed Content | Decision Engine |
|---|---|---|
| Moderation API (pre-check) | sexual/minors, graphic sex, glorified violence, explicit hate | OpenAI Moderation model with custom thresholds |
| Foundation Model | Provider-specific safety rules | Nova / Claude / Command / GPT etc. |
2) The Safety Pipeline
2.1 Operation-Aware Moderation (transform vs generate)
We treat different services differently:
-
Transform ops (translate • convert • edit • analytics • writing coach) → Mostly permissive
- Soft-flags continue to the model (often in “safe-mode”: no tools, strict formatting, token caps).
- Hard-flags block the chunk immediately (e.g.,
sexual/minors, or scores above custom thresholds).
-
Generate ops (ghostwriting • outline • outline section • prompt/sample) → Tighter
- High confidence violations are blocked.
- The most severe categories (e.g.,
sexual/minors) always block.
Outcomes from pre-check
- Pass –
flagged: false⇒ proceed to AI processing. - Soft flag –
flagged: truebut below thresholds ⇒ proceed in safe-mode. - Hard flag – forced category or score over threshold ⇒ chunk is not sent to the model; we echo the original with a violation wrapper so you can edit/resubmit.
2.2 🛡️ In-Flight Masking (Prompt-Injection Guardrails)
Before we call any model, we neutralize specific prompt-injection cues inside the AI call only and then faithfully restore them in the final output we deliver to you. Your manuscript on disk is never altered.
What gets masked (examples)
- Meta-instruction phrases (singular & plural; optional “all”), e.g.
ignore previous instructions,disregard prior prompts,override previous instructions - Direct meta references:
system prompt,developer instruction(s) - Tool/side-effect coaxing:
function call,tool call,exfiltrate - Role-like tags:
<system>,<assistant>,<developer> - Base64 header:
data:text/plain;base64,(payload not decoded; only header neutralized) - Zero-width / RTL control characters:
\u200B,\u200C,\u200D,\u2060,\uFEFF,\u202E
What does not get masked
- Lone ordinary words such as
ignore,service,promptwhen they don’t form the sequences above - Normal prose that merely mentions technology/tools without the exact cues
Important effects
- The model won’t see the exact masked sequences during processing; it sees safe placeholders.
- We restore the original words in the result text we stream to you—so your story remains intact.
- This only runs in-flight. Your original files/sections are unchanged.
Languages covered (initial set)
We actively look for injection cues in many languages, and will continute to add more guardrails as we mature. Coverage is evolving—if you see missed cases, please open a ticket with examples.
2.3 Foundation Model (per-chunk)
After masking, we pass the chunk to the selected model.
Possible outcomes:
- Input refusal – the model rejects the chunk; we echo your original chunk with a warning and move on.
- Output refusal – the model starts, then aborts; we drop partial output, echo your original chunk with a warning, and move on.
Note: Only the offending chunk is substituted; your document keeps streaming.
3) Audience Rating (0–4)
Bookcicle assigns each work an audience/safety rating from 0 (all ages) to 4 (explicit adult). We rate to the highest level triggered by any dimension. When uncertain, we round up.
0 — Safe for All (Early Readers)
- Violence: none; no peril or fear.
- Romance/sex: none; family affection only.
- Language: clean.
- Substances: none.
- Themes: gentle everyday topics; no upsetting content.
1 — Mild / Family (≈8+)
- Violence: cartoonish or implied peril; no injuries.
- Romance/sex: crushes, hand-holding; no sensual detail.
- Language: very mild (“darn/heck”), infrequent.
- Substances: none; neutral mentions OK.
- Themes: brief, non-intense mentions of loss or bullying.
2 — Teen (≈13+)
- Violence: non-graphic fights; injury without gore; on-page peril.
- Romance/sex: kissing; suggestive moments; “fade-to-black” only.
- Language: moderate profanity; no slurs as attack.
- Substances: brief mentions/off-page use; no glamorization.
- Themes: bullying, discrimination, mental health, grief handled sensitively; sexual assault may be referenced non-graphically.
3 — Mature (≈16+)
- Violence: stronger, occasional blood; non-torture cruelty.
- Romance/sex: intimate situations; non-explicit (no anatomical detail).
- Language: frequent strong profanity; slurs only in contextual/historical depiction.
- Substances: on-page alcohol/drug use; consequences shown.
- Themes: abuse, trauma, systemic hate; sexual assault referenced non-graphically; high-intensity horror without gore.
4 — Adult / Explicit (18+)
- Violence: graphic/gory; torture depicted.
- Sex: explicit sexual acts or fetish/kink detail.
- Language: pervasive strong profanity; slurs in hostile context.
- Substances: detailed hard-drug use/abuse.
- Themes: sexual assault depicted; extreme psychological/body horror.
Never Allowed (any rating)
- Sexual content involving minors or sexualization of minors.
- Step-by-step instructions for self-harm/suicide or hard-drug manufacture.
- Direct incitement to violence or hate against protected groups.
Reason Codes (attach to ratings)
violence: none | mild | moderate | graphic
sexual_content: none | alluded | non_explicit | explicit
language: clean | mild | strong | severe
substances: none | mention | use | abuse
themes: [bullying, grief, mental_health, discrimination, assault, horror]
Example (JSON)
{
"audienceRating": {
"level": 2,
"label": "Teen",
"reasons": {
"violence": "moderate",
"sexual_content": "alluded",
"language": "moderate",
"substances": "mention",
"themes": [
"grief",
"bullying"
]
}
}
}
4) What You’ll See in Exports
4.1 HTML Export
<div class="content-policy-violation">
<!-- original chunk preserved exactly -->
<p>She reached up and …</p>
</div>
4.2 Bookcicle Viewer
A small rejected chunk is wrapped and visually flagged:
<article style={{margin: 30, backgroundColor: "oklch(27.8% 0.033 256.848)", borderRadius: 8, padding: 15}}>
<p>…clean content…</p>
<div
className="content-policy-violation"
style={{
border: "1px solid oklch(75% 0.183 55.934)",
borderRadius: 8,
padding: 16,
margin: 15,
marginTop: 35,
position: "relative",
}}
role="alert"
>
<span
style={{
position: "absolute",
top: "-1.5em",
left: "1em",
background: "transparent",
padding: "0 2px",
fontSize: "0.875rem",
color: "#666",
lineHeight: 1,
}}
>
Content Policy Violation
</span>
{/* original chunk preserved exactly */}
<p>…original text…</p>
</div>
<div id="lipsum">…other content…</div>
</article>
4.3 DOCX / Word / LibreWriter
- [Policy Violation] Review this paragraph – model refused to process
If the section was skipped at moderation time (pre-check), the note reads:
- [Policy Violation — pre-check] Blocked by Moderation
Your original text remains intact so you can edit/resubmit.
5) FAQs
Q • Will Bookcicle ever delete my text? A: No. We never delete your words. Blocked passages are echoed back verbatim with a clear wrapper visible in the Bookcicle results viewer.
Q • What is “in-flight masking” and does it change my manuscript? A: We temporarily replace only specific prompt-injection sequences (e.g., “ignore previous instructions”, “system prompt”) inside the AI call so the model can’t be tricked. We restore your exact wording in the streamed result. Your files on disk are never changed. But, it may result in content that looks odd in the AI output/results, requiring human review.
Q • Will single words like “ignore” or “service” be masked? A: No. Masking targets precise sequences (see list above), not isolated words.
Q • Why did the model seem to “ignore” the phrase ignore previous instructions? A: By design. That phrase is neutralized during processing to protect your run and our providers, then restored in the final output you receive.
Q • Do I pay for tokens if a chunk is refused? A: Some providers (e.g., certain Bedrock models) may charge for input tokens and any generated output up to refusal. We try to minimize this by catching hard-flags early.
Q • Can I disable the masking layer? A: No. It protects your project and our vendor relationships. It is transparent (lossless) and in-flight only.
✈️ Creating bold stories sometimes means flying close to the sun—our safety pipeline makes sure you don’t get burned.