Skip to content

dhcpcd refuses to stop if interface link is down #587

@e-pirate

Description

@e-pirate

Hi, everyone.

During extensive (quite abusive) tests after switching from dhclient to dhcpcd on my ISP (WAN) interface I found out that dhcpcd v10.3 refuses to exit if Ethernet link it manages for DHCP lease goes down.

I was performing a regular net.wan0 (openrc) interface stop with Ethernet connector removed from WAN interface while dhcpcd running on that interface and working correctly (in regard of DHCP/hooks processing) ended in dhcpcd process still running even after openrc timeouted waiting for it to exit:

# /etc/init.d/net.wan0 stop
 * Bringing down interface wan0           
 *   Stopping dhcpcd on wan0 ...                       
sending signal ALRM to pid 25014                       
waiting for pid 25014 to exit             
<about a minute later>
pid 25014 failed to exit
#

The dhcpcd process remained running ignoring SIGTERMs. The only way to get rid of was to send SIGKILL.

Then I decided to check what is going on with dhcpcd running in foreground:

# dhcpcd --nobackground --debug wan0
wan0: soliciting a DHCP lease
<Optical connector of the terminal/converter manually removed, the optical link is down, no connection with ISP equipment, Ethernet link is up>
wan0: sending DISCOVER (xid 0x2aeba72c), next in 3.8 seconds
wan0: sending DISCOVER (xid 0x2aeba72c), next in 8.4 seconds
wan0: sending DISCOVER (xid 0x2aeba72c), next in 15.7 seconds
wan0: sending DISCOVER (xid 0x2aeba72c), next in 31.1 seconds
wan0: sending DISCOVER (xid 0x2aeba72c), next in 64.1 seconds
<Optical connector returned, optical link goes up>
wan0: offered XX.XX.XX.66 from XX.XX.X.1
wan0: sending REQUEST (xid 0x2aeba72c), next in 4.8 seconds
wan0: acknowledged XX.XX.XX.66 from XX.XX.XX.1
wan0: leased XX.XX.XX.66 for 300 seconds
wan0: renew in 150 seconds, rebind in 262 seconds
wan0: writing lease: /var/lib/dhcpcd/wan0.lease
wan0: adding IP address XX.XX.XX.66/19 broadcast XX.XX.XX.255
wan0: adding route to XX.XX.XX.0/19
wan0: adding default route via XX.XX.XX.1
wan0: executing: /lib/dhcpcd/dhcpcd-run-hooks BOUND
<Ethernet connector manually removed from optical terminal/converted, Ethernet link goes down>
wan0: carrier lost
wan0: executing: /lib/dhcpcd/dhcpcd-run-hooks NOCARRIER
wan0: deleting IP address XX.XX.XX.66/19
wan0: deleting route to XX.XX.XX.0/19
wan0: deleting default route via XX.XX.XX.1
wan0: executing: /lib/dhcpcd/dhcpcd-run-hooks EXPIRE
wan0: executing: /lib/dhcpcd/dhcpcd-run-hooks EXPIRE <- why second EXPIRE?
^Creceived SIGINT, stopping
wan0: removing interface
^Creceived SIGINT, stopping
^Creceived SIGINT, stopping
^Creceived SIGINT, stopping

And again, the only way to get dhcpcd stopped and release the console prompt was to send SIGKILL from outside.

Because I still have dhclient available, I decided to confirm this is not HW/drived related, switched to dhclient as DHCP-client for WAN interface and performed exactly the same sequence: started interface with openrc standard way, waited for dhclient to receive a lease and kick all the corresponding hooks, then removed the connector from the WAN interface of the server and stopped the interface with /etc/init.d/net.wan0 stop command experiencing no problems: dhclient kicked hooks, exited flawlessly and openrc reported interface is stopped confirming it is dhcpcd issue.

This issue poses many problems:

  1. firs of all misbehaving dhcpcd brakes OS/system interface management by keeping it (openrc in case of Gentoo) waiting for process to exit, this by itself introduce a noticeable delay on interface stop (and reboot as a case of it) slowing all related processes;
  2. second, even worse case, remained dhcpcd process will interfere with a new dhcpcd process lunched by OS in case of interface restart potentially leaving corresponding interface malfunction;
  3. a regular user not aware of the problem will be unable to restore from that issue if interface link goes down mid-flight and he decide to restart the corresponding interface managed by dhcpcd;

As proven by decades of experience, WAN (ISP-facing) interfaces may go down randomly staying so for many hours in case of severe incidents. dhcpcd seem to recover correctly from carrier loss if left running and watch for carrier to restore, but because of unpredictable nature of "remote" ISP equipment such dhcpcd behavior will introduce an extra level of hard to debug problems.

Look like a serious problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions