The engineering decisions that made Sant Chat safe to install
Six engineering decisions that make Sant Chat safe to install on a production WordPress site. Session validation, sanitisation, and data boundaries.
19 April 202613 min read
A WordPress chatbot is not a widget. It is an inbound channel from the open web into a production database, running inside a content management system that powers roughly forty percent of public sites. Every turn of a visitor conversation is a POST request. Every rating is a state change. Every voice message is an upload. Every lead captured is a row written into a site the plugin owner has to answer for. The attack surface of a chatbot plugin is closer to a comment system than a contact form, and the industry has spent twenty years learning what happens to WordPress comment systems that skip validation.
Most chatbot plugins in the ecosystem do not treat themselves as inbound channels. They accept input by default. They write to the database without checking who is asking. They sanitise with whatever function the developer happened to know. They default to sending data off the site, because handling it locally is harder.
Sant Chat AI was built on the opposite assumption. Any inbound channel from the open web is hostile until the code proves otherwise. The site owner is the customer, not the vendor. Data stays local unless it explicitly has to leave, and when it leaves, the path is narrow and nameable.
This post walks through six decisions that express that posture, named in engineering terms rather than marketing terms, with the tradeoff called out for each. The plugin source is public through the WordPress.org SVN repository at plugins.svn.wordpress.org/sant-chat-ai/trunk/. The server side is closed, but the patterns described below are accurate descriptions of the production code.
Why a chatbot is an inbound channel, not a widget
A static contact form accepts one type of input, sends one email, and writes one row. Its attack surface is the shape of that one form. A chatbot accepts an open ended stream of inputs, writes many rows across many turns, may deliver notifications, may capture leads, may store conversation state, may upload voice audio, and may rate or correct its own outputs.
That is not a larger version of a contact form. That is a different class of surface. The mental model that belongs here is comments, not forms. Every guardrail WordPress comment systems learned the hard way applies. Session boundaries. Nonce verification where it matters. Field by field sanitisation. Validation of value shape before content. Defaults that lean safer.
The decisions below came from treating Sant Chat AI as an inbound channel from day one, and auditing it against the WordPress Plugin Compliance Checker before every release. The tooling and how Sant Limited built it is covered in the WordPress plugin compliance checker Sant built on itself. What follows here is what the tooling kept honest.
Stateful endpoints need more than sanitisation
The rating endpoint is the clearest example. When a visitor marks a chatbot response as helpful or not helpful, the plugin writes a rating against a specific session and a specific message. That is a stateful write. The session identifier in the request has to match the session the visitor actually owns. Otherwise anyone on the internet can send a crafted request and rate any message in any session on any site running the plugin. That is not a hypothetical exploit. That is the default behaviour of a rating endpoint that only checks whether the input is sanitised.
Session ownership validation on stateful endpoints is the specific pattern. Before the plugin writes the rating, the handler confirms that the session identifier in the request belongs to the browser making the request. The check is a few lines of code. The payoff is the difference between an audit log worth trusting and an audit log that has been vandalised by bots.
The broader principle is that sanitisation is a mechanism for cleaning inputs. It is not a mechanism for authorising actions. Those are different concerns, and conflating them is one of the most common security mistakes in the WordPress plugin ecosystem. The compliance checker flags the pattern. The fix is cheap. The adoption rate across the ecosystem is low.
Sant Chat applies this pattern to every endpoint that writes state. Ratings, lead capture, session updates, correction submissions. Each handler verifies ownership before the write. Routes are registered in includes/class-rest-api.php. The handlers live in class-ajax-handlers.php, which covers the twenty one sync endpoints the widget uses.
Validate the shape of the value, not just its cleanliness
WordPress REST endpoints accept a validate_callback parameter for every registered field. Almost nobody uses it. The result is a generation of plugins that sanitise a rating field as text, then accept any text at all as a valid rating value.
Sant Chat uses validate_callback on the shape of every registered value. A rating is 1 or minus 1. If the value is 0, or 2, or the string "yes", the request is rejected before it reaches the sanitisation layer. A voice audio upload has to be actual base64. If it is a string that happens to contain the right characters but fails to decode, the request is rejected. A session identifier has to match the session format. If it does not, the request never hits the database.
This is cheap to add. The WordPress REST API registers the callback at the same time it registers the route. The effect is that malformed requests die early, before they reach the code that assumes a well formed value. Every defence has a cost and a payoff. This one has almost no cost, and a large payoff in shrinking the surface that the rest of the code has to defend.
Sant Chat uses validate_callback on every registered field on every endpoint. Shape first, content second. The two are not interchangeable, and validating shape is not a subset of sanitising content.
Sanitise the field for what the field actually holds
The default WordPress sanitisation function is sanitize_text_field. It strips tags, collapses whitespace, removes line breaks. It is fine for a short plain text field. It is wrong for almost everything else, and it is the most common sanitisation call in the ecosystem.
A URL is not a plain text field. It is a URL, and it needs esc_url_raw to be sanitised for storage. An email is not a plain text field. It is an email, and it needs sanitize_email. A textarea with intentional line breaks is not a plain text field. It needs sanitize_textarea_field to preserve those line breaks. An integer is not a plain text field. It needs a cast and a range check. A JSON blob is not a plain text field and cannot be sanitised as one, which is the subject of the next section.
Sant Chat uses the specific sanitisation function for each field type across its settings surface. URLs for webhook destinations. Emails for notification addresses. Textareas for long form content. Integers for limits and counts. The engineering cost is about ten seconds per field, which is the time it takes to pick the right function. The payoff is that the sanitiser actually does what the field needs.
Blanket sanitize_text_field across a settings form is not a shortcut. It is a silent bug. It corrupts legitimate input, and it provides no protection against input that was not text to begin with.
JSON sanitisation has an order that matters
Some Sant Chat endpoints accept JSON payloads. The common mistake in plugin code is to sanitise the JSON string first and then decode it. That produces nothing useful. The sanitiser sees a string of characters, not a structure, and cannot know which characters matter. Worse, it can corrupt the JSON itself, turning valid payloads into decode errors.
The correct order is to decode the JSON first, then sanitise the parsed structure field by field. Every string value in the parsed object receives the sanitisation appropriate to its field type. Nested objects are walked recursively. The result is a clean structure the rest of the code can trust.
This is the kind of decision that looks pedantic until it is wrong. The failure mode of sanitising JSON as a string is silent. The payload looks fine. The decode succeeds. The code downstream has no idea that it is reading an input that was never actually checked. That is how injection bugs survive audit.
Sant Chat decodes first and sanitises the parsed object. Every time. The cost is a helper function. The payoff is that JSON inputs are actually validated.
Opt in by default for anything that leaves the site
Version 1.0.0 of Sant Chat shipped with email notifications turned on by default. A site owner installed the plugin, and if the WordPress settings included an admin email, the plugin would send notifications to that email on lead capture events. The default was wrong.
The site owner had not agreed to have their admin email used by a plugin they had just installed. Presence in the settings is not consent. The fix shipped in version 1.0.6 in early April 2026. Email notifications are now opt in. The site owner has to turn the feature on and enter the destination address.
That is a small change in the settings UI. It is a much larger change in the posture. The principle is that nothing should leave the site without the site owner actively choosing to send it. Not email addresses. Not visitor data. Not usage telemetry beyond what is strictly required for the service to operate.
This is the default Sant Limited applies across its product surface. Features that send data off site ship off. The site owner decides when they turn on. The engineering cost is a single boolean and a checkbox. The payoff is that the site owner is actually in charge.
What stays on the WordPress site, what crosses to Sant's service, and what is never stored
The clearest way to describe the data handling is by boundary. Two zones. The site's own WordPress database is one zone. Sant's service is the other. The line between them is visible, auditable, and intentionally narrow.
Inside the site's own WordPress database, in custom tables created by the plugin, live the lead records captured by the chatbot, the chat session logs, the visitor conversation state, and the per site settings. This data stays on the site. The site owner controls it. Standard WordPress backup and export tools handle it the same way they handle posts and comments. If the site owner uninstalls the plugin, the plugin data goes with the uninstall.
What crosses to Sant's service deserves specifics. Visitor messages are not persisted in full. A hundred character snippet of the visitor's query is retained alongside token counts and retrieval metadata, for analytics and billing, with basic regex redaction applied before storage. That regex catches common patterns like email addresses and US format phone numbers. It does not catch every possible identifier, and describing the output as fully anonymised would overclaim. The full visitor message and the full model response are not stored after the request completes.
Optional email notifications, if the site owner has opted in, cross through the delivery layer. That is the extent of the outbound path from a routine chatbot turn.
What is never stored on Sant's service: full visitor message content, full model responses, voice audio, and lead record contents.
Sant Chat's data handling choices sit inside the New Zealand Privacy Act 2020, the framework Sant Limited operates under. The same principles map closely to the Australian Privacy Principles and to GDPR's data minimisation and storage limitation articles, which matter to Australian and European customers respectively. The posture is the point. Data stays local unless it has to leave. When it leaves, the path is narrow enough to describe on one screen, and the description matches what the code actually does.
Where this fits in Sant Launch services
The decisions above are engineering decisions, and they live inside the Launch phase of the Sant methodology. Launch services covers the work of shipping a site, a product, or a tool to the public web with the same posture Sant Chat was built on. Inbound channels are treated as inbound channels. Stateful endpoints are treated as stateful endpoints. Data boundaries are drawn before the first visitor arrives, not after the first incident.
Sites already in production, or products already in market, move into the Scale phase. Sant Cloud is where ongoing compliance maintenance, security patching, and regulatory response live. The six decisions above were maintained across nine compliance releases in the week of 1 to 6 April 2026, eight of them on a single day. That cadence is the kind of discipline Cloud plans are designed to keep, without the site owner having to schedule each cycle manually.
A future post in this series will cover what happens when the same discipline meets regulated contexts. Health records, financial services, government, and other environments where explicit controls sit on top of the baseline. Those cases deserve their own treatment rather than being squeezed into a post about chatbot plugin engineering.
Frequently asked questions
Does Sant Chat store visitor conversations on its servers. Partially, and the specifics matter. A hundred character snippet of each visitor query is retained on Sant's service alongside token counts and retrieval metadata for analytics and billing, with basic regex redaction applied first. The full message and the full model response are not stored after the request completes. The full conversation log stays on the WordPress site, in custom tables created by the plugin.
Can a visitor rate or correct a message they did not send. No. Every stateful endpoint on Sant Chat, including ratings, corrections, and lead capture, verifies that the session identifier in the request belongs to the browser making the request before writing anything. A crafted request that names someone else's session is rejected before the write.
Where is the plugin source code. The plugin source is published through the WordPress.org SVN repository, linked from the Sant Chat AI listing on wordpress.org. The patterns described in this post are visible there. The server side that receives chatbot messages is a closed codebase, and the descriptions above are accurate to the production code.
Is Sant Chat compliant with the New Zealand Privacy Act 2020. Sant Chat's data handling sits inside the Privacy Act 2020 framework, and the decisions described above are designed to support compliance rather than undermine it. A formal compliance claim is something Sant Limited will publish when it can be backed by a full accompanying audit position, and not before.
How does Sant Chat compare to other WordPress chatbot plugins on security. The six decisions described in this post, session ownership validation on stateful endpoints, validate_callback on every registered field, field aware sanitisation across settings, correct JSON sanitisation order, opt in defaults for outbound data, and a narrow documented data boundary, are the auditable differences. The WordPress Plugin Compliance Checker that Sant Limited published separately flags the absence of these patterns in code it audits. Each plugin installer can run the checker against their own stack.
Closing
The decisions above are not dramatic. They do not require new frameworks or novel architecture. They require picking the right function for the field, validating the shape before sanitising the content, treating stateful endpoints as stateful, defaulting to opt in for anything that leaves the site, sanitising JSON in the correct order, and describing the data boundary honestly rather than flattering it.
The reason to name them explicitly is that the default in the WordPress plugin ecosystem is the opposite of each one. Blanket sanitize_text_field. Missing validate_callback. Stateful writes without session ownership checks. Opt out defaults for outbound data. JSON sanitised as a string. Plugins that ship each of those defaults are installed on millions of sites.
Sant Chat was built on the other set of defaults because Sant Limited builds for sites that need to stay installed. A chatbot plugin is an inbound channel, and inbound channels deserve engineering discipline that matches their surface. The six decisions above are the shape of that discipline inside Sant Chat. A future post in this series will cover what happens when the same discipline meets regulated contexts, which deserves its own treatment.
If you run a WordPress site and need a chatbot built to sit on the public web without compromising the posture of the site around it, install Sant Chat AI from the WordPress.org plugin directory. If you are shipping a product and want the decisions above applied to your own codebase before it goes live, Sant Launch services covers that work. If the site is already live and the concern is keeping the posture honest across updates, Sant Cloud carries that discipline on an ongoing schedule.