Throttle Stuck. firmware 3.63, dual focbox via can bus, xmatic via HM-10 module, vx1 remote via ppm

firmware 3.63,
dual focbox via can bus,
xmatic via HM-10 module,
vx1 remote via ppm

recently updated my focboxes to 3.63 an after 60 miles or so of travel had the board continue to accelerate and ignore braking and throttle. I had to bail out.

Could this be the same issue as the recent unity issue? cc @rpasichnyk @Deodand

I note the same USART priority changes seem to be made in the latest update to VESC
firmware. but I will admit to not understanding the interaction that triggers the (very scary) issue.

3.63 commit switches usart priorities to 3 from 12.

--- a/mcuconf.h
+++ b/mcuconf.h
@@ -277,12 +277,12 @@
 #define STM32_SERIAL_USE_UART4              TRUE
 #define STM32_SERIAL_USE_UART5              FALSE
 #define STM32_SERIAL_USE_USART6             TRUE
-#define STM32_SERIAL_USART1_PRIORITY        12
-#define STM32_SERIAL_USART2_PRIORITY        12
-#define STM32_SERIAL_USART3_PRIORITY        12
-#define STM32_SERIAL_UART4_PRIORITY         12
-#define STM32_SERIAL_UART5_PRIORITY         12
-#define STM32_SERIAL_USART6_PRIORITY        12
+#define STM32_SERIAL_USART1_PRIORITY        3
+#define STM32_SERIAL_USART2_PRIORITY        3
+#define STM32_SERIAL_USART3_PRIORITY        3
+#define STM32_SERIAL_UART4_PRIORITY         3
+#define STM32_SERIAL_UART5_PRIORITY         3
+#define STM32_SERIAL_USART6_PRIORITY        3

fix for unity seemed to be reverting a similar change:


which @rpasichnyk called out as suspect.

2 Likes

If it wasn’t immediate, I doubt it’s the same issue. I mean, I could be wrong, but just doesn’t seem logical. I would swap the remote and RX and see if that resolves the issue, if the problem follows the remote or the FocBox.

Interesting, did it occur on others immediately on unity firmware update? that wasn’t clear to me.

That priority change in the VESC code certainly looks exactly the same as the one that is blamed for the Unity issue. and I had over 800 miles on this setup before the firmware change. It’d be nice to have someone more in the know confirm it’s not subject to the same issue.

To your point, I guess the remote and receiver could have randomly failed in a constant PPM on mode, is there a way people reproduce that? :confused: I’d prefer not to fail on the road again if possible, obviously. :slight_smile:

This is what I would do…

  1. Flip it over on a table, turn it on and see if its still stuck. If not, then damn. That means reproducing the issue is gonna be tough.

  2. If it is still stuck, turn the remote off and see if it goes to neutral. If it doesn’t then it could very well be a remote issue.

  3. Turn everything off and check the PPM leads, from the RX to the Unity, for shorts or broken connections. If everything looks OK, swap the remote out and see if it’s still stuck. If not then issue is the original remote.

@Deodand, can you advise on if this is the same issue as the Unity Firmware + METR Pro issue from a few weeks ago?

I did notice this in the change logs, I thought it was actually a different priority that I had changed (interrupt priority) but I was mistaken. It’s possible this could be the same bug. It should be harder to trigger with a single unit since the CPU is doing a bit less (just 1 motor) but that prioritization can lead to some weird behavior in my experience. I did notice he also changed the priority of the DMA stream and this:

#define CORTEX_SIMPLIFIED_PRIORITY FALSE
#define CORTEX_SIMPLIFIED_PRIORITY TRUE

So I’m not sure if the interaction between these settings might make the change permissible. It’s all a bit arcane to me as my CS background isn’t strong enough to understand some of the prioritization stuff and problems therein.

3 Likes

Could be the same issue, but only time will tell, with UNITY it were several people who experienced it. I always recommend limiting max erpm, so that even if your board stops reacting to PPM input, it still doesn’t accelerate to crazy speeds.

2 Likes

Ben and I and Jeff just had a little chat on Telegram. Outcome: Jeffrey + Benjamin will try to reproduce this behaviour ASAP.

2 Likes

It was not still stuck. After power cycle I I rode it home 6 miles cautiously, slowly. I have not managed to reproduce.

At home on the bench I practiced remote disconnect, tinfoil for faraday cage and moving a few meters away. the disconnects behave well, I did 20 of htem. after the 1000ms delay the motors go 0 current as configured. I opened the remote and checked all the solder points / wires inside… they all seemed good. I wiggled them all while moving the throttle and no cutouts.

Still todo, remove enclosure and checking internal connections, then checking failure modes for receiver disconnect, ble disconnect, and can disconnect.

3 Likes

Thank you, Jeff, Ben, Frank.

2 Likes

Internal wiring looks good.
Wiggled ppm wires and can wires at both ends and in between while running. No cutouts. All connectors are held in with hot glue.

Disconnected receiver while running. Primary motor stopped after 1s. Secondary stopped 1s later. On reconnect it starts up again.

Disconnected can bus while running. secondary motor stops after 1s. Reconnect and it starts.

Disconnected ble module (it’s on primary) while running. Motors run per normal. Xmatic disconnects. on reconnect of module xmatic reconnects.

1 Like

Vesc-tool 1.27 is up now. PPM and a high UART priority now work seamless together.
Benjamin Vedder found a bug in ChibiOS which is used in the VESC software.
There is also a new FW for the Wand integrated.

1 Like

Haven’t been able to try 1.27/3.64 because of can_bus oddities I’m observing:

For readers info this bug is now patched.

3 Likes

Yeah. 3.65 is out. I’m riding it. Got some snow time ahead so could be a few weeks before I get many miles on it.

I did do some full speed runs with xmatic attached on a path I could hopefully bail or have time to power off. All Ok so far. :crossed_fingers:

Ok I have 110 miles (176km) on firmware 3.65 using the same setup with which the incident occurred ( dual 4.10 focboxes via canbus, xmatic connected via the HM-10 ) 3 trips through the exact same area I had the incident.

No issues so far. :crossed_fingers:

4 Likes

180 miles no issues

gonna switchover to the metr pro this weekend, so that’s the end of miles on this specific combination. with the metr i will switch from 9600 to 115200 baud rate so could be a more taxing setup.

207 more miles with Xmatic via HM-10 AND Metr on the other VESC. both actively recording.

No issues so far :crossed_fingers: ( except two over temp faults. )

4 Likes

total of 609 miles on 3.65 no reoccurrence of the issue.
416 miles with Xmatic via HM-10 AND Metr at the same time.

This incident was only one negative data point, but I consider firmware 3.63 dangerous.

and now fixed in later fw.

thanks all. for the fix and paying attention.