Positive acknowledgement protocols do not guarantee delivery

I’ve recently had a deja-vu while reviewing a proprietary application level protocol; the protocol wasn’t robust with respect to loss of the connection at the transport layer. The reason for this was a design flaw I’ve seen before.

The proprietary protocol relied on a positive acknowledgement transport protocol (like TCP/IP) to ensure that unacknowledged packets got retransmitted. However, lost packets are typically not retransmitted indefinitely. TCP/IP protocol stacks, for example, only try to retransmit unacknowledged packets a limited number of times before aborting the connection.

So, if a connection breaks, how much data can a sender assume that the reciever actual got? Not much, actually. Lack of a positive acknowledgement of a packet could mean that the receiver never got the packet. However, it could also mean that the receiver actually got the packet, but the acknowledgement was somehow lost on its way back to the sender. A broken connection may therefore potentially leave the state of the two peers out-of-synch. When the transport layers become ready to exchange data again, the application layers must somehow be able to continue their protocol without knowing each others precise state.

The best strategy to cope with this problem depends on the problem domain/protocol. Retransmission of the last PDU may be one option, but it will require that the protocol can cope with duplicate PDU’s. Another solution (but often more involved) would be to have the application level protocol let peers probe each others state before continuing normal protocol flow.

Much the same conclusions can be drawn for negative acknowledgement protocols, by the way.