Event-Driven Architecture in Practice: Spring Boot, Kafka, and PostgreSQL

From Theory to Working Code

In a previous article, I explored Event-Driven Architecture from the conceptual side -- events vs. commands, Event Sourcing, CQRS, Saga patterns, and the cultural shift required to move from request-response thinking to event-driven thinking.

Theory is essential. But theory without code is a PowerPoint.

This post is about that transition. I built an open-source microservice called Mars Enterprise Kit Lite that implements EDA with Java 25, Spring Boot 4.0, Kafka (via Redpanda), and PostgreSQL 16 using Onion Architecture. The project is real, it runs, and you can clone it right now.

But here is the twist: the project has a deliberate flaw. It implements the Dual Write anti-pattern -- and I left it there on purpose.

Understanding the problem is the first step to solving it.

What Is the Dual Write Problem in Event-Driven Systems?

Here is a scenario most of us have faced.

A customer places an order. Your service saves it to PostgreSQL. Then it publishes an order.created event to Kafka so downstream services can react -- inventory, billing, notifications.

Two writes. Two systems. One operation.

What happens when the database commit succeeds but the Kafka publish fails? The order exists in the database. No event was published. Downstream consumers never learn about it. Inventory is never reserved. The billing service never charges.

sequenceDiagram
    participant HR as HTTP Request
    participant OS as Order Service
    participant DB as Database
    participant K as Kafka

    HR->>OS: POST /orders
    OS->>DB: INSERT order
    DB-->>OS: OK
    OS-xK: publish("order.created") ❌
    Note over OS,K: FAILURE — event never delivered

The reverse is equally dangerous: Kafka receives the event, but the database transaction rolls back. Now downstream consumers act on an order that was never persisted.

No retry. No compensation. Silent inconsistency.

It works... until it doesn't.

The Dual Write problem occurs when a service writes to two separate systems -- such as a database and a message broker -- without atomic guarantees across both. If the first write succeeds but the second fails, the systems become silently inconsistent.

A @Transactional annotation covers the database. But Kafka lives outside that transactional boundary. There is no native atomicity across both. This gap is what the Mars Enterprise Kit Lite project exposes -- intentionally.

I built this project to expose this flaw on purpose. Understanding the problem is the first step to solving it. Later in this article, I show how the Transactional Outbox Pattern solves this — and how the Mars Enterprise Kit Pro implements that pattern end-to-end.

How the Dual Write Problem Appears in Spring Boot and Kafka

Now let's look at the consistency gap more closely.

// app/ module -- OrderService.java
@Transactional
public UUID createOrder(Set<OrderItem> items, UUID customerId) {
    return createOrderUseCase.execute(
        new CreateOrderUseCase.Input(items, customerId));
}

Inside createOrderUseCase.execute(), the code saves to PostgreSQL and publishes to Kafka. The @Transactional annotation wraps the database operation, but Kafka is not part of that transaction.

Here is the timeline of what can go wrong:

t=0ms   -> POST /orders arrives
t=1ms   -> @Transactional begins
t=3ms   -> orderRepository.save(order)     [DB write, within transaction]
t=5ms   -> orderEventPublisher.publish()   [Kafka send, OUTSIDE transaction]
t=6ms   -> Kafka acknowledges              [Event is now in Kafka]
t=7ms   -> DB commit fails                 [Network error, disk full, constraint violation]

Result: Event exists in Kafka. Order does NOT exist in PostgreSQL.
        Downstream consumers process a ghost order.

Three failure scenarios:

DB succeeds, Kafka fails -- Order persisted but event never published. Downstream services never know about it.
Kafka succeeds, DB rolls back -- Event published but order never saved. Consumers act on a phantom order.
Kafka is down after DB commit -- Event lost silently. No retry. No compensation.

The code looks correct. It compiles. It passes unit tests. It works in dev. It works... until it doesn't.

This is why the flaw is intentional. You need to see it to understand why patterns like the Transactional Outbox exist.

Proving the Dual Write in Practice: Chaos Testing

Talking about failure is one thing. Watching it happen is another.

The project includes two chaos testing scenarios you can run to see the Dual Write breaking on your machine. This is not a simulation -- it is real inconsistency between PostgreSQL and Kafka.

Scenario 1: Phantom Event (Ghost Event in Kafka)

The problem: an order.created event exists in Kafka, but the order does not exist in PostgreSQL. Any consumer processing this event will reference a phantom order.

The project includes a built-in chaos endpoint (POST /chaos/phantom-event) that uses an AOP interceptor to force a DB rollback after the Kafka event has already been published. To activate it, start the application with the chaos profile:

# Start the app with the chaos profile
cd app && SPRING_PROFILES_ACTIVE=chaos mvn spring-boot:run

# Trigger the phantom event scenario
curl -s -X POST http://localhost:8082/chaos/phantom-event \
  -H "Content-Type: application/json" \
  -d '{
    "customerId": "550e8400-e29b-41d4-a716-446655440000",
    "items": [
      {"productId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "quantity": 2, "unitPrice": 149.95}
    ]
  }'

Response:

{
  "orderId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "existsInDb": false,
  "eventSentToKafka": true,
  "dbRolledBack": true,
  "explanation": "PHANTOM EVENT: The order.created event was published to Kafka, but the order does NOT exist in PostgreSQL. Any consumer processing this event will reference a non-existent order."
}

Verify it yourself -- the order does not exist in the database, but the event is in Kafka:

# Order does NOT exist in PostgreSQL (rolled back)
docker-compose exec postgres psql -U mars -d orders_db -c \
  "SELECT * FROM orders WHERE id = '<orderId>';"
# → (0 rows)

# Event DOES exist in Kafka
docker-compose exec redpanda rpk topic consume order.created --num 1 --offset end
# → Event payload with the phantom orderId

How it works internally: PhantomEventChaosAspect is an AOP @Around advice that intercepts ChaosOrderExecutor.execute(). It lets the use case run completely (DB INSERT + Kafka publish), then throws a PhantomEventSimulationException. Since the exception occurs inside the @Transactional boundary, Spring rolls back the DB -- but KafkaTemplate.send() already dispatched the event. All chaos beans use @Profile("chaos") and do not exist in the default profile.

Scenario 2: Lost Event (Kafka Down)

The problem: an order is persisted in PostgreSQL, but the order.created event is never published. Downstream consumers never learn the order was created.

This scenario does not need a special endpoint. Just stop Redpanda before creating an order:

# 1. Create a baseline order (everything healthy)
curl -s -X POST http://localhost:8082/orders \
  -H "Content-Type: application/json" \
  -d '{"customerId":"550e8400-e29b-41d4-a716-446655440000","items":[{"productId":"6ba7b810-9dad-11d1-80b4-00c04fd430c8","quantity":1,"unitPrice":50.00}]}'
# → 201 Created

# 2. Kill Kafka
docker-compose stop redpanda

# 3. Try to create another order
curl -s -X POST http://localhost:8082/orders \
  -H "Content-Type: application/json" \
  -d '{"customerId":"aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee","items":[{"productId":"11111111-2222-3333-4444-555555555555","quantity":1,"unitPrice":99.99}]}'

# 4. Bring Kafka back
docker-compose start redpanda
sleep 10

# 5. Compare: DB has 2 orders, Kafka has only 1 event
docker-compose exec postgres psql -U mars -d orders_db -c \
  "SELECT COUNT(*) FROM orders;"
# → 2

docker-compose exec redpanda rpk topic consume order.created \
  --format '%v\n' | wc -l
# → 1

Order #2 exists in the database but has no corresponding event in Kafka. No downstream consumer knows it exists.

Two Faces of the Same Problem

Both scenarios are caused by the same root issue: no atomicity between PostgreSQL and Kafka.

	Scenario 1: Phantom Event	Scenario 2: Lost Event
Trigger	AOP forces DB rollback after publish	Kafka is down during order creation
PostgreSQL	Order does NOT exist (rolled back)	Order EXISTS (committed)
Kafka	Event EXISTS (already sent)	Event does NOT exist (publish failed)
Impact	Consumers process a non-existent order	Consumers never learn the order was created
Reproduction	`POST /chaos/phantom-event` (requires `chaos` profile)	`docker-compose stop redpanda` + `POST /orders`
Fix	Transactional Outbox Pattern	Transactional Outbox Pattern

Both failures are silent in production. No errors in the logs, no alerts, no retries. The system continues operating with inconsistent state between the database and the message broker.

Want to reproduce these scenarios on your machine? The repository is open: mars-enterprise-kit-lite. Five minutes to see the Dual Write breaking for real.

Running the Project

The entire stack runs with Docker Compose. Clone the repository and you are three commands away from a running system:

# 1. Start infrastructure (PostgreSQL 16 + Redpanda)
docker-compose up -d

# 2. Build all modules
mvn clean install

# 3. Run the application
cd app && mvn spring-boot:run

Create an order:

curl -X POST http://localhost:8082/orders \
  -H "Content-Type: application/json" \
  -d '{
    "customerId": "550e8400-e29b-41d4-a716-446655440000",
    "items": [
      {
        "productId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
        "quantity": 2,
        "unitPrice": 149.95
      }
    ]
  }'
# Response: 201 Created
# { "orderId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }

The Redpanda Console at http://localhost:8888 lets you inspect Kafka topics, see the order.created event, and verify the payload.

Redpanda is a Kafka-compatible streaming platform that runs without ZooKeeper or a JVM. It implements the Kafka protocol natively, so your application code does not change -- only the broker does.

Component	Technology	Version
Language	Java	25
Framework	Spring Boot	4.0.3
Build Tool	Maven	Multi-module
Database	PostgreSQL	16 (alpine)
Messaging	Redpanda (Kafka-compatible)	v24.3.1
Event Format	JSON (Jackson)	-
ORM	Spring Data JPA	-
Schema Management	Flyway	-
Testing	JUnit 5, Mockito, TestContainers, REST Assured	-

If you find this useful, drop a star on the repository — it helps other developers discover the project.

AI-First Design: CLAUDE.md as a Prompt

Here is where things get interesting.

The project was designed from the start to be operated by an AI agent. Not as an afterthought -- as a first-class design constraint.

The CLAUDE.md file is not just documentation. It is a prompt disguised as a README. It tells Claude Code the architecture rules, dependency directions, domain invariants, naming conventions, and how to run the project end-to-end.

The .mars/docs/ directory contains the AI knowledge base -- architecture decision records, module responsibilities, and coding conventions.

Custom Claude Code commands and skills extend the workflow:

/generate-prp -- generates a Product Requirements Prompt for a new feature, based on the existing architecture
/execute-prp -- implements the feature following TDD (Red, Green, Refactor)
chaos-phantom-event -- runs the Phantom Event scenario end-to-end: starts the app with the chaos profile, calls POST /chaos/phantom-event, and verifies the event exists in Kafka but the order does not exist in PostgreSQL
chaos-testing -- runs the Lost Event scenario: stops Redpanda, creates an order, and verifies the order exists in the database but the event was lost in Kafka

The AI can spin up the entire environment, create an order via REST, verify the Kafka event, trigger a cancellation through the consumer, and validate the final state. A full end-to-end smoke test:

1. docker compose up -d          (PostgreSQL 16 + Redpanda)
2. Wait for services to be healthy
3. Verify Kafka topics exist     (order.created + order.cancelled)
4. POST /orders                  -> validate 201 Created + orderId
5. Consume order.created event   -> validate orderId matches
6. Publish order.cancelled       -> { orderId, reason: "smoke-test" }
7. GET /orders/{orderId}         -> validate status = CANCELLED
8. docker compose down
9. Report PASS or FAIL with logs

Beyond the smoke test, Claude Code runs the chaos skills autonomously:

# Phantom Event -- proves Kafka has an event for an order that does not exist in the DB
Run the chaos-phantom-event skill

# Lost Event -- proves the DB has an order with no event in Kafka
Run the chaos-testing skill with scenario lost-event

But the design is not about replacing developers. When you structure a project so an AI can operate it, you are forced to make things explicit. Architecture rules live in a document, not in someone's head. Conventions are written down, not tribal knowledge. The test sequence is a script, not a mental checklist.

This benefits every developer on the team, not just the AI. The CLAUDE.md doubles as onboarding documentation. The smoke test sequence is the acceptance criteria for "the system works."

AI-First design is Context Engineering applied to development infrastructure.

Best Practices

After building this project and several other event-driven systems, here is what I keep coming back to:

Keep your domain layer framework-free. Zero Spring, zero JPA, zero Kafka in the inner layer. This forces clean boundaries and makes unit testing trivial.
Define ports as interfaces in the domain, not in infrastructure. The domain dictates the contracts. Adapters fulfill them. Never the other way around.
Use Java records for domain objects and events. Immutability by default. No getters/setters ceremony. Clear intent.
Make the Dual Write problem visible. If your system writes to a database and a message broker in the same operation, acknowledge it. Document it. Plan for the failure modes. Better yet, build chaos tests that reproduce the failures -- a phantom event endpoint or a Kafka shutdown script makes the problem tangible for the whole team.
Start with JSON events, graduate to Avro. JSON is simple and readable. When you need schema evolution and backward compatibility, move to Avro with Schema Registry.
Use Redpanda for local development. Kafka-compatible, no ZooKeeper, no JVM. Starts in seconds.
Write your architecture rules in a file. Whether it is a CLAUDE.md, an ADR, or a dev4dev document -- make it explicit. Your future self (and your AI assistant) will thank you.

Solving the Dual Write Problem with Transactional Outbox

You just saw the problem breaking. Phantom events, lost events, silent inconsistency. In dev, this is an exercise. In production, this is a Friday night with your phone ringing.

The Dual Write problem in this project is intentional. I left it there so you can see it, understand it, and feel why it matters. In a production system, you would never ship this without a solution.

The Transactional Outbox Pattern solves the Dual Write problem by writing the event to an outbox table within the same database transaction as the business data. A separate process polls the outbox and publishes events to the message broker. Because the business write and the event write share a single transaction, atomicity is guaranteed.

The Mars Enterprise Kit Pro solves it with this pattern — and goes much further. It includes Helm charts, CI/CD pipelines, Apache Avro with Schema Registry, OpenTelemetry observability, and patterns like SAGA and CQRS. The production-grade evolution of what the Lite version teaches.

If the Lite version already has chaos testing with AOP and automated failure reproduction, imagine what the Pro version delivers.

You need to understand the problem before you appreciate the solution. That is why the Lite comes first.

Feature	Lite (Free)	Enterprise Kit Pro
Onion Architecture	Yes	Yes
Kafka + PostgreSQL	Yes	Yes
AI-First design	Yes	Yes
Transactional Outbox Pattern	No	Yes
Apache Avro + Schema Registry	No	Yes
Helm / Kubernetes	No	Yes
CI/CD pipelines	No	Yes
OpenTelemetry observability	No	Yes
SAGA / CQRS patterns	No	Yes
Production-Ready	No	Yes

The Lite teaches you the problem. The Pro gives you the solution. See what changes →

Conclusion

Event-Driven Architecture is not a theoretical exercise. It is a set of tradeoffs that show up in real code, in real failure modes, in real production incidents.

The Mars Enterprise Kit Lite gives you a working codebase to explore those tradeoffs. Clone it, run it, break it. And now you can prove the problem on your own machine: fire POST /chaos/phantom-event and watch the ghost event appear in Kafka, or stop Redpanda and watch the event disappear. This is not theory -- it is real inconsistency you can observe, debug, and understand.

Read the domain layer and notice the absence of frameworks. Trace the Dual Write through the code. Run the chaos tests. Then look at how the Transactional Outbox Pattern eliminates that gap.

Want to try it? Clone and break things.

Mars Enterprise Kit Lite is free, open-source, and runs on your machine in 5 minutes. Clone it, start Docker Compose, run the chaos tests, and watch the Dual Write breaking for real.

github.com/andrelucasti/mars-enterprise-kit-lite

Read the CLAUDE.md and let Claude Code reproduce the Dual Write failures for you.

If this project helped you understand the Dual Write problem, drop a star on the repo. It costs nothing and helps other developers find this content.

Need this in production?

Mars Enterprise Kit Pro solves the Dual Write with the Transactional Outbox Pattern implemented end-to-end — and includes everything you need to go to production: Helm charts, CI/CD, Apache Avro with Schema Registry, OpenTelemetry observability, SAGA, and CQRS.

Discover Mars Enterprise Kit Pro →

If you have questions or want to discuss event-driven patterns, feel free to connect on LinkedIn.

References

Tags: