Add critsec to rp2040 xfer, check endpoint status by gemarcano · Pull Request #2474 · hathach/tinyusb

gemarcano · 2024-02-16T08:04:39Z

Describe the PR
hw_endpoint_lock_update is unimplemented for the rp2040 port. This PR takes a stab at implementing it. Without it, if the USB port is disconnected, say with tud_disconnect, a race condition can happen where a transfer is in progress and the USB IRQ can fire, calling reset_non_control_endpoints and leaving the ongoing transfer in a bad state, as it tries to access endpoint data structures that are now zero'd.

While debugging that main issue, I also found some cases of the rp2040 port assuming that an endpoint data structure exists for all endpoints, which is not the case as endpoint 0 has none. I've also included a few simple fixes for these.

Additional context
I've documented a lot of my debugging attempts in this discussion: #1764

I'm not super familiar with the tinyusb coding style, and I'm not sure my lock implementation is right (very naive, likely bad reference counting, might need atomic_int instead of int?).

Slion · 2024-07-12T18:41:46Z

This did not work so well for me as already mentioned there: #1764 (comment)
Still, very interesting stuff, thanks for sharing.

Slion · 2024-10-07T19:39:11Z

A lot has change in my project since I first came across this issue and now somehow even with the workarounds I had built-in I found a use case where I would always get PANIC ep 0 out was already available

So I eventually decided to checkout your branch gemarcano:rp2040_irq_lock_fix and sure enough it worked 🥳

Then I decided to remove the workaround I had in place to take care of the worst cases. It's basically filtering out some HID reports causing multiple USB remounts. Without it still did not panic but the USB connection seems to be broken somehow, not responding. I'll have to leave that workaround for now but it looks like I'll keep using your patch for the time being.

Slion · 2024-10-07T21:20:41Z

As I was trying to rebase that on the tip of master I realised the tip of master works just fine too. Same symptom as described above. So I'm afraid I can't tell if that patch helps in any way after all. What's sure is that it works just as well as the tip of master apparently. I could rebase it with no conflict too.

Slion · 2024-10-19T05:11:10Z

Tip of master still gave me that Panic after all, so I've started trying that patch again. Sure enough, I have not seen that Panic since. A couple of crashes and reboots maybe but possibly unrelated.

gemarcano · 2024-12-06T20:28:00Z

There were some recent changes in the rp2040 code adding ISO endpoints, so I had to tweak this PR a little bit to carry over the null-pointer dereference checks. The main change is that ep->configured = true; in dcd_rp2040.c is now set in the hw_endpoint_init function so it is set when either type of endpoint is initialized. This is similar to what happens in hcd_rp2040.c.

Thinking about it more, I should probably split the locking implementation from the null-pointer deref checks/fixes. Thoughts?

FearLabsAudio · 2025-03-05T19:34:23Z

@gemarcano The last commit ca0d53e Is working well for me.

RetiredWizard · 2025-09-12T19:09:33Z

I have been working on building CircuitPython for the Olimex RP2350pc and have modified the CircuitPython core to allow an rp2350 build to use the native usb pins for usb host instead of the normal CircuitPython device usage. I believe this is the first board attempting to use tinyUSB/RP2040 host with CircuitPython (non-PIO).

I seem to be running into the issue this PR is attempting to address as my initial build immediately would panic in rp2040_usb.c with "ep xx was already available". When I applied the changes from this PR the panic moved to hcd_rp2040.c with "Seq Error: [0] = 0xnnnn [1] = 0xnnnn". Which makes me think that there is still an unguarded transfer being initiated somewhere.

I don't know if it's any help but I also discovered that simply commenting out the original panic statement in rp2040_usb.c (panic("ep %02X was already available", ep->ep_addr);) and not applying this PR left me with what appears to be a fully functioning firmware. I exercised it for about 30 minutes without issue.

I was also able to largely mitigate the original panic by introducing "wait" loops in the CircuitPython core which effectively turned the usb (mouse) reads blocking. CircuitPython was using a short timeout period with the read transactions to provide a non-blocking read. This results in lots of transactions being started and then aborted. Adding wait loops to the core avoids the panic, however for reasons I'm not entirely sure (but I suspect it has to do with a callback routine not be used for transfer aborts) the wait ends up blocking any other USB transfers effectively disabling USB keyboard input when mouse reads are being performed.

I'll continue to poke around and see if I can identify any unguarded transfers but the USB protocol is all a little out of my depth. I do suspect/hope that this PR is on the right path though.

Edit: After 2-3 hours of exercising (playing minesweeper) the firmware that had the panic statement commented out finally crashed into safe mode, as I probably should have expected.....

- Implemented a critical section for the rp2040 port, which now prevents an IRQ from clearing the endpoint data structures while a transfer was in progress, which would then lead to a null pointer derefence.

gemarcano · 2026-02-21T21:34:57Z

From what I can tell, latest changes in the rp2040 code fix the null pointer problems, so in theory I can now just focus on the actual issue, the race condition between the endpoint structs being cleared. I'm not sure what's the best way to keep track if the structs have all been initialized, so I just pulled out the ep->configured out of the #ifdef block it was put under so we can use it in general for host or device mode (I'm not sure if that may lead to a conflict... I don't think so? I don't think you can have both configured at the same time).

Right now all I can currently do is make sure this compiles. The hardware I had been using for this is currently packed away. It would be good for people using tud_disconnect with the rp2040 to test this further to confirm there aren't any unexpected deadlocks.

gemarcano mentioned this pull request Feb 24, 2024

Using FreeRTOS-Kernel-Heap3 library leads to deadlock across cores, race condition on core spinlock raspberrypi/pico-sdk#1646

Closed

gemarcano force-pushed the rp2040_irq_lock_fix branch from ae077b0 to bbf9ea0 Compare March 5, 2024 06:40

gemarcano force-pushed the rp2040_irq_lock_fix branch from bbf9ea0 to f2f0d25 Compare March 12, 2024 19:48

gemarcano force-pushed the rp2040_irq_lock_fix branch from f2f0d25 to 5ee7ae9 Compare June 22, 2024 17:49

gemarcano force-pushed the rp2040_irq_lock_fix branch from 5ee7ae9 to dcf472c Compare July 16, 2024 17:49

gemarcano force-pushed the rp2040_irq_lock_fix branch 2 times, most recently from 5441fb3 to 2318575 Compare November 13, 2024 06:16

gemarcano force-pushed the rp2040_irq_lock_fix branch from 2318575 to 1eb62ba Compare November 25, 2024 18:22

gemarcano force-pushed the rp2040_irq_lock_fix branch from 1eb62ba to f5b2459 Compare December 6, 2024 20:25

gemarcano force-pushed the rp2040_irq_lock_fix branch from f5b2459 to ca0d53e Compare February 24, 2025 00:15

gemarcano force-pushed the rp2040_irq_lock_fix branch from ca0d53e to 056520e Compare April 11, 2025 00:17

MrJake222 mentioned this pull request Apr 27, 2025

dfu download sometimes hangs tinyvision-ai-inc/pico-ice#56

Closed

samblenny mentioned this pull request Sep 12, 2025

USB Host: Unplugging device can break usb.core.find() adafruit/circuitpython#10563

Open

RetiredWizard mentioned this pull request Oct 4, 2025

rp2350: native tinyusb host mode adafruit/circuitpython#10652

Closed

gemarcano force-pushed the rp2040_irq_lock_fix branch from 056520e to 71cb582 Compare January 6, 2026 18:36

Add critsec to rp2040 xfer, check endpoint status

48ae658

- Implemented a critical section for the rp2040 port, which now prevents an IRQ from clearing the endpoint data structures while a transfer was in progress, which would then lead to a null pointer derefence.

gemarcano force-pushed the rp2040_irq_lock_fix branch from 71cb582 to 48ae658 Compare February 21, 2026 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add critsec to rp2040 xfer, check endpoint status#2474

Add critsec to rp2040 xfer, check endpoint status#2474
gemarcano wants to merge 1 commit intohathach:masterfrom
gemarcano:rp2040_irq_lock_fix

gemarcano commented Feb 16, 2024

Uh oh!

Slion commented Jul 12, 2024 •

edited

Loading

Uh oh!

Slion commented Oct 7, 2024 •

edited

Loading

Uh oh!

Slion commented Oct 7, 2024 •

edited

Loading

Uh oh!

Slion commented Oct 19, 2024

Uh oh!

gemarcano commented Dec 6, 2024

Uh oh!

FearLabsAudio commented Mar 5, 2025

Uh oh!

RetiredWizard commented Sep 12, 2025 •

edited

Loading

Uh oh!

gemarcano commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gemarcano commented Feb 16, 2024

Uh oh!

Slion commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Slion commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Slion commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Slion commented Oct 19, 2024

Uh oh!

gemarcano commented Dec 6, 2024

Uh oh!

FearLabsAudio commented Mar 5, 2025

Uh oh!

RetiredWizard commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemarcano commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Slion commented Jul 12, 2024 •

edited

Loading

Slion commented Oct 7, 2024 •

edited

Loading

Slion commented Oct 7, 2024 •

edited

Loading

RetiredWizard commented Sep 12, 2025 •

edited

Loading