In a previous article, I explored Event-Driven Architecture from the conceptual side -- events vs. commands, Event Sourcing, CQRS, Saga patterns, and the cultural shift required to move from request-response thinking to event-driven thinking.
Theory is essential. But theory without code is a PowerPoint.
This post is about that transition. I built an open-source microservice called Mars Enterprise Kit Lite that implements EDA with Java 25, Spring Boot 4.0, Kafka (via Redpanda), and PostgreSQL 16 using Onion Architecture. The project is real, it runs, and you can clone it right now.
But here is the twist: the project has a deliberate flaw. It implements the Dual Write anti-pattern -- and I left it there on purpose.
Understanding the problem is the first step to solving it.
Here is a scenario most of us have faced.
A customer places an order. Your service saves it to PostgreSQL. Then it publishes an order.created event to Kafka so downstream services can react -- inventory, billing, notifications.
Two writes. Two systems. One operation.
What happens when the database commit succeeds but the Kafka publish fails? The order exists in the database. No event was published. Downstream consumers never learn about it. Inventory is never reserved. The billing service never charges.
sequenceDiagram
participant HR as HTTP Request
participant OS as Order Service
participant DB as Database
participant K as Kafka
HR->>OS: POST /orders
OS->>DB: INSERT order
DB-->>OS: OK
OS-xK: publish("order.created") ❌
Note over OS,K: FAILURE — event never delivered
The reverse is equally dangerous: Kafka receives the event, but the database transaction rolls back. Now downstream consumers act on an order that was never persisted.
No retry. No compensation. Silent inconsistency.
It works... until it doesn't.
The Dual Write problem occurs when a service writes to two separate systems -- such as a database and a message broker -- without atomic guarantees across both. If the first write succeeds but the second fails, the systems become silently inconsistent.
A @Transactional annotation covers the database. But Kafka lives outside that transactional boundary. There is no native atomicity across both. This gap is what the Mars Enterprise Kit Lite project exposes -- intentionally.
I built this project to expose this flaw on purpose. Understanding the problem is the first step to solving it. Later in this article, I show how the Transactional Outbox Pattern solves this — and how the Mars Enterprise Kit Pro implements that pattern end-to-end.
Now let's look at the consistency gap more closely.
// app/ module -- OrderService.java
@Transactional
public UUID createOrder(Set<OrderItem> items, UUID customerId) {
return createOrderUseCase.execute(
new CreateOrderUseCase.Input(items, customerId));
}
Inside createOrderUseCase.execute(), the code saves to PostgreSQL and publishes to Kafka. The @Transactional annotation wraps the database operation, but Kafka is not part of that transaction.
Here is the timeline of what can go wrong:
t=0ms -> POST /orders arrives
t=1ms -> @Transactional begins
t=3ms -> orderRepository.save(order) [DB write, within transaction]
t=5ms -> orderEventPublisher.publish() [Kafka send, OUTSIDE transaction]
t=6ms -> Kafka acknowledges [Event is now in Kafka]
t=7ms -> DB commit fails [Network error, disk full, constraint violation]
Result: Event exists in Kafka. Order does NOT exist in PostgreSQL.
Downstream consumers process a ghost order.
Three failure scenarios:
The code looks correct. It compiles. It passes unit tests. It works in dev. It works... until it doesn't.
This is why the flaw is intentional. You need to see it to understand why patterns like the Transactional Outbox exist.
Talking about failure is one thing. Watching it happen is another.
The project includes two chaos testing scenarios you can run to see the Dual Write breaking on your machine. This is not a simulation -- it is real inconsistency between PostgreSQL and Kafka.
The problem: an order.created event exists in Kafka, but the order does not exist in PostgreSQL. Any consumer processing this event will reference a phantom order.
The project includes a built-in chaos endpoint (POST /chaos/phantom-event) that uses an AOP interceptor to force a DB rollback after the Kafka event has already been published. To activate it, start the application with the chaos profile:
# Start the app with the chaos profile
cd app && SPRING_PROFILES_ACTIVE=chaos mvn spring-boot:run
# Trigger the phantom event scenario
curl -s -X POST http://localhost:8082/chaos/phantom-event \
-H "Content-Type: application/json" \
-d '{
"customerId": "550e8400-e29b-41d4-a716-446655440000",
"items": [
{"productId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "quantity": 2, "unitPrice": 149.95}
]
}'
Response:
{
"orderId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"existsInDb": false,
"eventSentToKafka": true,
"dbRolledBack": true,
"explanation": "PHANTOM EVENT: The order.created event was published to Kafka, but the order does NOT exist in PostgreSQL. Any consumer processing this event will reference a non-existent order."
}
Verify it yourself -- the order does not exist in the database, but the event is in Kafka:
# Order does NOT exist in PostgreSQL (rolled back)
docker-compose exec postgres psql -U mars -d orders_db -c \
"SELECT * FROM orders WHERE id = '<orderId>';"
# → (0 rows)
# Event DOES exist in Kafka
docker-compose exec redpanda rpk topic consume order.created --num 1 --offset end
# → Event payload with the phantom orderId
How it works internally:
PhantomEventChaosAspectis an AOP@Aroundadvice that interceptsChaosOrderExecutor.execute(). It lets the use case run completely (DB INSERT + Kafka publish), then throws aPhantomEventSimulationException. Since the exception occurs inside the@Transactionalboundary, Spring rolls back the DB -- butKafkaTemplate.send()already dispatched the event. All chaos beans use@Profile("chaos")and do not exist in the default profile.
The problem: an order is persisted in PostgreSQL, but the order.created event is never published. Downstream consumers never learn the order was created.
This scenario does not need a special endpoint. Just stop Redpanda before creating an order:
# 1. Create a baseline order (everything healthy)
curl -s -X POST http://localhost:8082/orders \
-H "Content-Type: application/json" \
-d '{"customerId":"550e8400-e29b-41d4-a716-446655440000","items":[{"productId":"6ba7b810-9dad-11d1-80b4-00c04fd430c8","quantity":1,"unitPrice":50.00}]}'
# → 201 Created
# 2. Kill Kafka
docker-compose stop redpanda
# 3. Try to create another order
curl -s -X POST http://localhost:8082/orders \
-H "Content-Type: application/json" \
-d '{"customerId":"aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee","items":[{"productId":"11111111-2222-3333-4444-555555555555","quantity":1,"unitPrice":99.99}]}'
# 4. Bring Kafka back
docker-compose start redpanda
sleep 10
# 5. Compare: DB has 2 orders, Kafka has only 1 event
docker-compose exec postgres psql -U mars -d orders_db -c \
"SELECT COUNT(*) FROM orders;"
# → 2
docker-compose exec redpanda rpk topic consume order.created \
--format '%v\n' | wc -l
# → 1
Order #2 exists in the database but has no corresponding event in Kafka. No downstream consumer knows it exists.
Both scenarios are caused by the same root issue: no atomicity between PostgreSQL and Kafka.
| Scenario 1: Phantom Event | Scenario 2: Lost Event | |
|---|---|---|
| Trigger | AOP forces DB rollback after publish | Kafka is down during order creation |
| PostgreSQL | Order does NOT exist (rolled back) | Order EXISTS (committed) |
| Kafka | Event EXISTS (already sent) | Event does NOT exist (publish failed) |
| Impact | Consumers process a non-existent order | Consumers never learn the order was created |
| Reproduction | POST /chaos/phantom-event (requires chaos profile) | docker-compose stop redpanda + POST /orders |
| Fix | Transactional Outbox Pattern | Transactional Outbox Pattern |
Both failures are silent in production. No errors in the logs, no alerts, no retries. The system continues operating with inconsistent state between the database and the message broker.
Want to reproduce these scenarios on your machine? The repository is open: mars-enterprise-kit-lite. Five minutes to see the Dual Write breaking for real.
The entire stack runs with Docker Compose. Clone the repository and you are three commands away from a running system:
# 1. Start infrastructure (PostgreSQL 16 + Redpanda)
docker-compose up -d
# 2. Build all modules
mvn clean install
# 3. Run the application
cd app && mvn spring-boot:run
Create an order:
curl -X POST http://localhost:8082/orders \
-H "Content-Type: application/json" \
-d '{
"customerId": "550e8400-e29b-41d4-a716-446655440000",
"items": [
{
"productId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"quantity": 2,
"unitPrice": 149.95
}
]
}'
# Response: 201 Created
# { "orderId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }
The Redpanda Console at http://localhost:8888 lets you inspect Kafka topics, see the order.created event, and verify the payload.
Redpanda is a Kafka-compatible streaming platform that runs without ZooKeeper or a JVM. It implements the Kafka protocol natively, so your application code does not change -- only the broker does.
| Component | Technology | Version |
|---|---|---|
| Language | Java | 25 |
| Framework | Spring Boot | 4.0.3 |
| Build Tool | Maven | Multi-module |
| Database | PostgreSQL | 16 (alpine) |
| Messaging | Redpanda (Kafka-compatible) | v24.3.1 |
| Event Format | JSON (Jackson) | - |
| ORM | Spring Data JPA | - |
| Schema Management | Flyway | - |
| Testing | JUnit 5, Mockito, TestContainers, REST Assured | - |
If you find this useful, drop a star on the repository — it helps other developers discover the project.
Here is where things get interesting.
The project was designed from the start to be operated by an AI agent. Not as an afterthought -- as a first-class design constraint.
The CLAUDE.md file is not just documentation. It is a prompt disguised as a README. It tells Claude Code the architecture rules, dependency directions, domain invariants, naming conventions, and how to run the project end-to-end.
The .mars/docs/ directory contains the AI knowledge base -- architecture decision records, module responsibilities, and coding conventions.
Custom Claude Code commands and skills extend the workflow:
/generate-prp -- generates a Product Requirements Prompt for a new feature, based on the existing architecture/execute-prp -- implements the feature following TDD (Red, Green, Refactor)chaos-phantom-event -- runs the Phantom Event scenario end-to-end: starts the app with the chaos profile, calls POST /chaos/phantom-event, and verifies the event exists in Kafka but the order does not exist in PostgreSQLchaos-testing -- runs the Lost Event scenario: stops Redpanda, creates an order, and verifies the order exists in the database but the event was lost in KafkaThe AI can spin up the entire environment, create an order via REST, verify the Kafka event, trigger a cancellation through the consumer, and validate the final state. A full end-to-end smoke test:
1. docker compose up -d (PostgreSQL 16 + Redpanda)
2. Wait for services to be healthy
3. Verify Kafka topics exist (order.created + order.cancelled)
4. POST /orders -> validate 201 Created + orderId
5. Consume order.created event -> validate orderId matches
6. Publish order.cancelled -> { orderId, reason: "smoke-test" }
7. GET /orders/{orderId} -> validate status = CANCELLED
8. docker compose down
9. Report PASS or FAIL with logs
Beyond the smoke test, Claude Code runs the chaos skills autonomously:
# Phantom Event -- proves Kafka has an event for an order that does not exist in the DB
Run the chaos-phantom-event skill
# Lost Event -- proves the DB has an order with no event in Kafka
Run the chaos-testing skill with scenario lost-event
But the design is not about replacing developers. When you structure a project so an AI can operate it, you are forced to make things explicit. Architecture rules live in a document, not in someone's head. Conventions are written down, not tribal knowledge. The test sequence is a script, not a mental checklist.
This benefits every developer on the team, not just the AI. The CLAUDE.md doubles as onboarding documentation. The smoke test sequence is the acceptance criteria for "the system works."
AI-First design is Context Engineering applied to development infrastructure.
After building this project and several other event-driven systems, here is what I keep coming back to:
CLAUDE.md, an ADR, or a dev4dev document -- make it explicit. Your future self (and your AI assistant) will thank you.You just saw the problem breaking. Phantom events, lost events, silent inconsistency. In dev, this is an exercise. In production, this is a Friday night with your phone ringing.
The Dual Write problem in this project is intentional. I left it there so you can see it, understand it, and feel why it matters. In a production system, you would never ship this without a solution.
The Transactional Outbox Pattern solves the Dual Write problem by writing the event to an outbox table within the same database transaction as the business data. A separate process polls the outbox and publishes events to the message broker. Because the business write and the event write share a single transaction, atomicity is guaranteed.
The Mars Enterprise Kit Pro solves it with this pattern — and goes much further. It includes Helm charts, CI/CD pipelines, Apache Avro with Schema Registry, OpenTelemetry observability, and patterns like SAGA and CQRS. The production-grade evolution of what the Lite version teaches.
If the Lite version already has chaos testing with AOP and automated failure reproduction, imagine what the Pro version delivers.
You need to understand the problem before you appreciate the solution. That is why the Lite comes first.
| Feature | Lite (Free) | Enterprise Kit Pro |
|---|---|---|
| Onion Architecture | Yes | Yes |
| Kafka + PostgreSQL | Yes | Yes |
| AI-First design | Yes | Yes |
| Transactional Outbox Pattern | No | Yes |
| Apache Avro + Schema Registry | No | Yes |
| Helm / Kubernetes | No | Yes |
| CI/CD pipelines | No | Yes |
| OpenTelemetry observability | No | Yes |
| SAGA / CQRS patterns | No | Yes |
| Production-Ready | No | Yes |
The Lite teaches you the problem. The Pro gives you the solution. See what changes →
Event-Driven Architecture is not a theoretical exercise. It is a set of tradeoffs that show up in real code, in real failure modes, in real production incidents.
The Mars Enterprise Kit Lite gives you a working codebase to explore those tradeoffs. Clone it, run it, break it. And now you can prove the problem on your own machine: fire POST /chaos/phantom-event and watch the ghost event appear in Kafka, or stop Redpanda and watch the event disappear. This is not theory -- it is real inconsistency you can observe, debug, and understand.
Read the domain layer and notice the absence of frameworks. Trace the Dual Write through the code. Run the chaos tests. Then look at how the Transactional Outbox Pattern eliminates that gap.
Mars Enterprise Kit Lite is free, open-source, and runs on your machine in 5 minutes. Clone it, start Docker Compose, run the chaos tests, and watch the Dual Write breaking for real.
github.com/andrelucasti/mars-enterprise-kit-lite
Read the CLAUDE.md and let Claude Code reproduce the Dual Write failures for you.
If this project helped you understand the Dual Write problem, drop a star on the repo. It costs nothing and helps other developers find this content.
Mars Enterprise Kit Pro solves the Dual Write with the Transactional Outbox Pattern implemented end-to-end — and includes everything you need to go to production: Helm charts, CI/CD, Apache Avro with Schema Registry, OpenTelemetry observability, SAGA, and CQRS.
Discover Mars Enterprise Kit Pro →
If you have questions or want to discuss event-driven patterns, feel free to connect on LinkedIn.
