Project

General

Profile

Actions

Bug #5286

closed

The reader sometimes uses a wrong bSeq number in the response

Added by rousseau over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
-
Start date:
10/29/2021
Due date:
% Done:

100%

Spec Reference:
CCID v1.1 3.1.3 Bulk-in, Bulk-out pipes

Description

I notice that the reader may answer to a command with an invalid bSeq number, i.e. not the bSeq number used in the command sent to the reader.

For example I have the sequence (complete log attached):

00000033 [139745030735808] -> 000002 65 00 00 00 00 02 0C 00 00 00 
00000024 [139745030735808] <- 000002 81 00 00 00 00 02 16 01 00 00 

00000011 [139745030735808] -> 000002 65 00 00 00 00 02 0D 00 00 00 
00000019 [139745030735808] <- 000002 81 00 00 00 00 02 0C 01 00 00 
00000019 [139745030735808] <- 000002 81 00 00 00 00 02 0D 01 00 00 

00000009 [139745030735808] -> 000002 65 00 00 00 00 02 0E 00 00 00 
00000013 [139745030735808] <- 000002 81 00 00 00 00 02 0E 01 00 00 

00000018 [139744979597056] -> 000002 65 00 00 00 00 02 0F 00 00 00
00000022 [139744979597056] <- 000002 81 00 00 00 00 02 0F 01 00 00 

First CCID frame: "65 00 00 00 00 02 0C 00 00 00"
The driver sends a 0x65 (PC_to_RDR_GetSlotStatus) command to slot 2 with bSeq = 0x0C

The reader then answers with "81 00 00 00 00 02 16 01 00 00"
Here the bSeq value is 0x16 which is invalid and should be 0x0C as in the driver command.

The driver then sends "65 00 00 00 00 02 0D 00 00 00"
uses bSeq = 0x0D and the reader respond with 2 CCID frames
81 00 00 00 00 02 0C 01 00 00
81 00 00 00 00 02 0D 01 00 00

The first frame uses the bSeq for the initial command.
The second frame uses the correct bSeq number.

The CCID specification in 3.1.3 Bulk-in, Bulk-out pipes write:

To maintain track of all the commands, the Host sets a unique command identifier bSeq
for each command sent to the CCID. To mark command completion the CCID sends the
response with the same bSeq number. The driver will not send a new command to a slot
until the ending response to the last command to that slot is received. If it is determined
that the current command has erred or timed out, the response can be
abandoned/aborted by issuing the class specific abandon/abort command to the control
endpoint.

It is not clear is the bSeq should be unique for the reader or for each slot.
For example I don't know if the reader should support two commands with the same bSeq value but for 2 different slots.

It looks like the sysmoOCTSIM firmware can handle the same bSeq on different slots. So that is nice.

I attach 2 log files:
  • log_CCID.txt contains only the CCID frame
  • log.txt contains the complete pcscd logs

You can search for the message "Duplicate frame detected" in the log.txt file.

It looks like the problem occurs only for the first CCID answer from a slot.
It does not always occurs.

My CCID driver has code to handle frame duplication and the problem occurs only during slot initialization so the problem was not noticed until now.


Files

log_CCID.txt log_CCID.txt 12.8 KB rousseau, 10/29/2021 09:32 AM
log.txt log.txt 139 KB rousseau, 10/29/2021 09:43 AM
Actions #1

Updated by rousseau over 2 years ago

I am using the latest firmware 0.2.67-30c2

Actions #2

Updated by laforge over 2 years ago

  • Assignee set to Hoernchen
Actions #3

Updated by laforge over 2 years ago

I could imagine this is actually not a firmware bug, but a host software bug.

Are you sure that the '16' response is not a response to a much earlier CCID command?

This way it would make sense:
  • host is issuing bSeq 0C before it ever received a response for prior bSeq 16
  • first response from firmware is a response for 0C
  • second response from firmware is a resposne for 0D

afterwards it's synchronized again.

I'm not excluding any bugs on the firmware side, but to me the trace really looks like there's a mis-matching of responses to requests on the host/driver side.

Actions #4

Updated by laforge over 2 years ago

In this log file I don't see any command to slot 2 using bSeq 16. So it may be that the related message has been in a queue in the firmware for a longer time, maybe from before the log.txt even started.

So my theory remains that the bSeq=16 is a "late" response to some other command that has been sitting in some queue for [too?] long. I would find it highly implausible that the firmware would ever send two responses to a single command, like rousseau has assumed in the original analysis.

From a driver point of view, I would suggest do discard any unexpected bSeq and "wait" for the response with the correct bSeq. This way one avoids synchronization problems. The driver must never interpret a response with a non-matching bSeq.

Meanwhile, we'll see what may be causing messages to stay in a queue for too long time.

Actions #5

Updated by rousseau over 2 years ago

You are right.
I used wireshark to analyze the USB packets as described in https://ludovicrousseau.blogspot.com/2014/10/ccid-usb-spy-using-wireshark.html

The problematic packets are received by the driver before the driver sent any command.
The problem occurs when pcscd is stopped and restarted. In this case some packets are in the reader queues waiting. When pcscd is restarted these packets are sent to the driver before the driver send any command.

I modified my CCID driver to ignore packets with incorrect bSeq.

It is not a problem in the firmware. You can close this issue.

Actions #6

Updated by Hoernchen over 2 years ago

We've actually had one of those problems before, https://gerrit.osmocom.org/c/osmo-ccid-firmware/+/24770 - but in this case it's apparently really a problem with pending transfers while the host side dies, and there is nothing in the ccid spec I am aware of that handles this, so as far as I can tell the reader needs to be "drained" of stale transfers upon opening anyway.

Actions #7

Updated by laforge over 2 years ago

  • Status changed from New to Feedback
  • Assignee changed from Hoernchen to rousseau
  • % Done changed from 0 to 90

Hoernchen wrote in #note-6:

We've actually had one of those problems before, https://gerrit.osmocom.org/c/osmo-ccid-firmware/+/24770 - but in this case it's apparently really a problem with pending transfers while the host side dies, and there is nothing in the ccid spec I am aware of that handles this, so as far as I can tell the reader needs to be "drained" of stale transfers upon opening anyway.

I agreee, the pcscd/libcicd should at its very first operation after opening the IN endpoint drain any transfers that might be pending.

This is in general best practice for any kind of USB driver, I've done it in a number of libusb-based programs before. I'm not aware of any other reliable method to avoid stale/old in transfers, as the USB device has no idea when you are opening an endpoint or when you are closing it (e.g. due to program crash).

Leaving the ticket open regarding this feedback to rousseau

Actions #8

Updated by rousseau over 2 years ago

  • Status changed from Feedback to Closed
  • % Done changed from 90 to 100

I agree with laforge. The CCID driver is fixed now.
I close the issue.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)