Event-driven system and subscribers missing events

SlurrerOfSpeech · Jan 4, 2019

Let's say I have a service that publishes events, like

e_o ("Bought 100 shares of AAPL")
e₁ ("Bought 100 shares of T")
e₂ ("Sold 500 shares of TSLA")

and there exist stateful services subscribing to the events and whose state depends on the events being processed successfully and in the correct order.

There are many things that can go wrong on the subscription side:

A subscribing service fails to process an event and is not able to try to re-process it, leading to a contaminated state.
A service "successfully" processes the event, but because of a bug in the processing, it actually failed to process it. This is actually equivalent to the first bullet point.

Should the subscribing services have a way of "resetting" their state once such a problem occurs?

For example, let's a service processed e₀ and e₂ but not e₁, let's say because e₁ somehow got lost. Maybe the subscribing service keeps a record of events it processed and knows once it sees e₂ that it needs to first process e₁ and can get it from some service that stores all the events.

jedishrfu · Jan 4, 2019

It might be better for the subscribers to pull the requests from a broker in this case so that events are processed in the correct order.

ZeroMQ was made for these kinds on systems:

http://zguide.zeromq.org/page:all

As you read down they will present various architectures for microservice architectures with pros and cons.

http://zeromq.org/intro:read-the-manual

Svein · Jan 4, 2019

jedishrfu said:

It might be better for the subscribers to pull the requests from a broker in this case so that events are processed in the correct order.

And there you are in the middle of what I was doing the last ten years of my professional life - the problem of "time stamping" an event with the correct universal time. After all, there might be several brokers - how do you ensure that the time stamp of an event is correct?

Some years ago, I published an insight here (https://www.physicsforums.com/insights/time-synchronization-across-switched-ethernet/) which discussed the clock synchronization problem for various accuracy requirements. For human systems (like the broker problem), the NTP protocol (with an estimated synchronization accuracy of about 2ms) is more than precise enough. The only problem is that the system clock will drift between synchronizations and thus a timestamped event must somehow report the time of the last synchronization and the measured clock drift between the two last synchronizations.

For a more thorough discussions of time synchronization, read the insight.

jedishrfu · Jan 4, 2019

I was referring to an MQ broker where producer programs write messages to a queue and consumer programs read messages from the queue in a transactional scheme. In this way if the consumer fails then it can restart and not miss a transaction and process them in the correct order. The transactional feature is important as a message won't be dropped from the queue until the transaction is completed however the feature may slow down the system if the message load is very heavy as in stock ticker systems.

Nice insight by the way, I think MQ systems and database systems have these notions embedded within them at least I'm pretty sure distributed partitioned database schemes need this to work correctly.

.Scott · Jan 4, 2019

SlurrerOfSpeech said:

For example, let's a service processed e₀ and e₂ but not e₁, let's say because e₁ somehow got lost. Maybe the subscribing service keeps a record of events it processed and knows once it sees e₂ that it needs to first process e₁ and can get it from some service that stores all the events.

This is more an issue of "resyncing" than "resetting". In general, it will not be possible to "unservice" e₂ - but if that is possible, then you could unwind all transactions since the mis-step. A more likely solution would be to periodically checkpoint your servicer's state. So if I checkpoint at e100, e200, and e300 then discover at e377 that I missed e267, I can go back to e200 and process forward from that point.
It is also possible that you can determine whether the missing event matters anymore. If you are keeping a list of the most recent 20 events, loosing an event before that will not matter.

Event-driven system and subscribers missing events

1. What is an event-driven system?

2. What are subscribers in an event-driven system?

3. Why might subscribers miss events in an event-driven system?

4. How can missing events be handled in an event-driven system?

5. What are some potential consequences of subscribers missing events in an event-driven system?

Similar threads

Hot Threads

Recent Insights