banner

Interface Keepalive

Last updated 

Photo Credits: Unsplash

Introduction

I recently acquired a mini-PC with the Intel I225-V network card. It is the reason I will always check for known NIC bugs on future computing purchases.

As it turns out, the I225-V suffers from well-known and widespread issues. In my experience, those issues have manifested as a single port (out of the NIC's 4) that periodically just... stops working. Since I'm using the box as an edge device, it's been frustrating, to say the least.

Intel released an official fix for the card - but only for the Windows OS. Ideally, an official fix will be released for Debian (the OS I'm running), but I won't hold my breath. There's also a slew of articles and forum posts by other users attempting to fix the issue on various Linux distros, but I have yet to find one that works. Perhaps I'll try searching again in the future. For now, however, I've found a (kludgy) way of addressing the problem.

Scripting an Interface Reboot

My solution has 4 components: a cron job, two scripts, and a log file.

The Cron Job

/etc/cron.d/interface-toggle
SCRIPT=/root/int-toggle/toggle-physints.sh
*/2 * * * * root if [ -f "$SCRIPT" ]; then $SCRIPT; fi

Script #1

/root/int-toggle/toggle-physints.sh
#!/bin/bash
# call int-toggle for all physical interfaces

LOGFILE=/root/int-toggle/toggle.log
interfaces="enp1s0 enp2s0 enp3s0 enp4s0" # replace this with your interface names

for interface in $interfaces; do
	/root/int-toggle/int-toggle.sh $interface
done

# optional logging
# date=$(date +"%Y-%m-%d %H:%M:%S")
# echo $publishedDate: completed run | tee -a $LOGFILE

Script #2

/root/int-toggle/int-toggle.sh
#!/bin/bash
# toggle interface provided as arg until it works

LOGFILE=/root/int-toggle/toggle.log

log () {
	local msg=$1
	local timestamp=$(date +'%Y-%m-%d %H:%M:%S ')
	local msg="$timestamp $msg"
	echo $msg | tee -a $LOGFILE
}

remove_pid () {
	[ -f "$pidfile" ] && rm "$pidfile"
}

interface=$1
pidfile=/tmp/toggle-${interface}.pid

if [ -f "$pidfile" ]
then
	log "aborting on interface $interface. script already running."
	exit 0
fi

trap remove_pid EXIT
touch "$pidfile"

start=$(date +'%Y-%m-%d %H:%M:%S')
status=$(ip link show $interface | grep -v link/ether | cut -d ' ' -f 11)
count=0

while [ "$status" == "DOWN" ]
do
	((count+=1))
	ifdown $interface
	sleep 3
	ifup $interface
	sleep 3
	status=$(ip link show $interface | grep -v link/ether | cut -d ' ' -f 11)
	echo Try: $count\; Status: $status
done

msg="$interface up. Took $count tries. Started at $start"

if [ "$count" -gt "0" ]
then
	log "$msg"
else
	#log "$interface was already up. No action taken"
	#echo $msg | tee -a $LOGFILE
fi

The Log File

/root/int-toggle/toggle.log
# Sample Output
2023-08-10 15:26:01 enp1s0 was already up. No action taken
2023-08-10 15:26:01 enp2s0 was already up. No action taken
2023-08-10 15:26:01 enp3s0 was already up. No action taken
2023-08-10 15:26:01 enp4s0 was already up. No action taken