Event-driven system and subscribers missing events

AI Thread Summary
Event-driven systems face challenges when subscribers miss events, which can lead to a contaminated state or incorrect processing. Solutions include allowing subscribers to reset their state or pull events from a broker to ensure correct order processing. Time synchronization is crucial in these systems, with protocols like NTP providing sufficient accuracy for timestamping events. Checkpointing subscriber states can help manage missed events by allowing a rollback to a known good state. Ultimately, determining the relevance of missing events is essential for maintaining system integrity.
SlurrerOfSpeech
Messages
141
Reaction score
11
Let's say I have a service that publishes events, like

eo ("Bought 100 shares of AAPL")
e1 ("Bought 100 shares of T")
e2 ("Sold 500 shares of TSLA")

and there exist stateful services subscribing to the events and whose state depends on the events being processed successfully and in the correct order.

There are many things that can go wrong on the subscription side:
  • A subscribing service fails to process an event and is not able to try to re-process it, leading to a contaminated state.
  • A service "successfully" processes the event, but because of a bug in the processing, it actually failed to process it. This is actually equivalent to the first bullet point.
Should the subscribing services have a way of "resetting" their state once such a problem occurs?

For example, let's a service processed e0 and e2 but not e1, let's say because e1 somehow got lost. Maybe the subscribing service keeps a record of events it processed and knows once it sees e2 that it needs to first process e1 and can get it from some service that stores all the events.
 
Technology news on Phys.org
jedishrfu said:
It might be better for the subscribers to pull the requests from a broker in this case so that events are processed in the correct order.
And there you are in the middle of what I was doing the last ten years of my professional life - the problem of "time stamping" an event with the correct universal time. After all, there might be several brokers - how do you ensure that the time stamp of an event is correct?

Some years ago, I published an insight here (https://www.physicsforums.com/insights/time-synchronization-across-switched-ethernet/) which discussed the clock synchronization problem for various accuracy requirements. For human systems (like the broker problem), the NTP protocol (with an estimated synchronization accuracy of about 2ms) is more than precise enough. The only problem is that the system clock will drift between synchronizations and thus a timestamped event must somehow report the time of the last synchronization and the measured clock drift between the two last synchronizations.

For a more thorough discussions of time synchronization, read the insight.
 
  • Like
Likes Klystron
I was referring to an MQ broker where producer programs write messages to a queue and consumer programs read messages from the queue in a transactional scheme. In this way if the consumer fails then it can restart and not miss a transaction and process them in the correct order. The transactional feature is important as a message won't be dropped from the queue until the transaction is completed however the feature may slow down the system if the message load is very heavy as in stock ticker systems.

Nice insight by the way, I think MQ systems and database systems have these notions embedded within them at least I'm pretty sure distributed partitioned database schemes need this to work correctly.
 
SlurrerOfSpeech said:
For example, let's a service processed e0 and e2 but not e1, let's say because e1 somehow got lost. Maybe the subscribing service keeps a record of events it processed and knows once it sees e2 that it needs to first process e1 and can get it from some service that stores all the events.
This is more an issue of "resyncing" than "resetting". In general, it will not be possible to "unservice" e2 - but if that is possible, then you could unwind all transactions since the mis-step. A more likely solution would be to periodically checkpoint your servicer's state. So if I checkpoint at e100, e200, and e300 then discover at e377 that I missed e267, I can go back to e200 and process forward from that point.
It is also possible that you can determine whether the missing event matters anymore. If you are keeping a list of the most recent 20 events, loosing an event before that will not matter.
 
  • Like
Likes jedishrfu
Thread 'Is this public key encryption?'
I've tried to intuit public key encryption but never quite managed. But this seems to wrap it up in a bow. This seems to be a very elegant way of transmitting a message publicly that only the sender and receiver can decipher. Is this how PKE works? No, it cant be. In the above case, the requester knows the target's "secret" key - because they have his ID, and therefore knows his birthdate.
Back
Top