Interface Keepalive
Photo Credits: Unsplash
Introduction
I recently acquired a mini-PC with the Intel I225-V network card. It is the reason I will always check for known NIC bugs on future computing purchases.
As it turns out, the I225-V suffers from well-known and widespread issues. In my experience, those issues have manifested as a single port (out of the NIC's 4) that periodically just... stops working. Since I'm using the box as an edge device, it's been frustrating, to say the least.
Intel released an official fix for the card - but only for the Windows OS. Ideally, an official fix will be released for Debian (the OS I'm running), but I won't hold my breath. There's also a slew of articles and forum posts by other users attempting to fix the issue on various Linux distros, but I have yet to find one that works. Perhaps I'll try searching again in the future. For now, however, I've found a (kludgy) way of addressing the problem.
Scripting an Interface Reboot
My solution has 4 components: a cron job, two scripts, and a log file.
The Cron Job
SCRIPT=/root/int-toggle/toggle-physints.sh
*/2 * * * * root if [ -f "$SCRIPT" ]; then $SCRIPT; fi
Script #1
#!/bin/bash
# call int-toggle for all physical interfaces
LOGFILE=/root/int-toggle/toggle.log
interfaces="enp1s0 enp2s0 enp3s0 enp4s0" # replace this with your interface names
for interface in $interfaces; do
/root/int-toggle/int-toggle.sh $interface
done
# optional logging
# date=$(date +"%Y-%m-%d %H:%M:%S")
# echo $publishedDate: completed run | tee -a $LOGFILE
Script #2
#!/bin/bash
# toggle interface provided as arg until it works
LOGFILE=/root/int-toggle/toggle.log
log () {
local msg=$1
local timestamp=$(date +'%Y-%m-%d %H:%M:%S ')
local msg="$timestamp $msg"
echo $msg | tee -a $LOGFILE
}
remove_pid () {
[ -f "$pidfile" ] && rm "$pidfile"
}
interface=$1
pidfile=/tmp/toggle-${interface}.pid
if [ -f "$pidfile" ]
then
log "aborting on interface $interface. script already running."
exit 0
fi
trap remove_pid EXIT
touch "$pidfile"
start=$(date +'%Y-%m-%d %H:%M:%S')
status=$(ip link show $interface | grep -v link/ether | cut -d ' ' -f 11)
count=0
while [ "$status" == "DOWN" ]
do
((count+=1))
ifdown $interface
sleep 3
ifup $interface
sleep 3
status=$(ip link show $interface | grep -v link/ether | cut -d ' ' -f 11)
echo Try: $count\; Status: $status
done
msg="$interface up. Took $count tries. Started at $start"
if [ "$count" -gt "0" ]
then
log "$msg"
else
#log "$interface was already up. No action taken"
#echo $msg | tee -a $LOGFILE
fi
The Log File
# Sample Output
2023-08-10 15:26:01 enp1s0 was already up. No action taken
2023-08-10 15:26:01 enp2s0 was already up. No action taken
2023-08-10 15:26:01 enp3s0 was already up. No action taken
2023-08-10 15:26:01 enp4s0 was already up. No action taken