Cluster Update: New apartment, New Network Hell
I am so excited! For the first time, I am living in a full-fledged apartment instead of a dorm room. I get everything moved in and enjoy my first day relaxing. The next day, I attempt to bring my computer cluster online... No luck!
When I power it up and connect it to an Ethernet port, there is no blinking led lights on the port. Well, the port is dead, right? Nope! Connecting other devices to the port works just fine. WTF?!
I console into the Espressobin board only to see no link is detected. This is funny because, during power-up, the link lights on the port stay active for a few seconds and then shut off. Ok, I have a wild idea!
From a few random conversations in the past, I know that our college campus network uses Packet Fence. This automates managing the network and also has the ability to "jail" or isolate misbehaving devices. My Espressobin definitely fits this category. It's running its own DHCP server, while it should be pointed only to the internal cluster network. However, if that DHCP somehow was misconfigured, the campus network would no doubt issues an "arrest warrant" and "jail" my cluster.
All of my network and DHCP configurations begin with the rc.local file (super stable, right!). I disable the rc.local script and run each command within it after boot. Still no luck, ahhhhhhhhhhhhh!
Maybe, it's the dhclient. 1 hour of debugging that later, still nothing.
For laughs and giggles, I plugged a USB ethernet adapter into the Espressobin.
Espressobin --> USB Ethernet --> Campus Network
It works! What in the world!!!!!!!!!!!!! Ok, this narrows down the problem to the exact port. It has nothing to do with my device "misbehaving", or a misconfigured DHCP client/server.
Side note: I tested the cluster on a different network. So, this confirms it's not physical damage to the port
lshw -C network command later and I found something. It could be nothing (again) or it could be everything. The
lshw -C network showed the connection speed of the link to be 1Gbps. This stood out to me because I ran a speed test on the port when I was testing the port with my laptop. This is important because I was a little disappointed that the apartment topped out at 100Mbps, unlike the 1000Mbps dorm link.
To continue this investigation I connected a different Linux board to the port, the Odroid xu4. It also has the capability of 1Gbps. However, the
lshw -C network report showed it's speed to be 100Mbps.
From random background knowledge, I get the just that Gigabit ethernet protocol uses more strands of wire in the ethernet cord differently than 100Mbps. This should all be auto negotiated when the link is first created. However, this isn't happening with my Espressobin.
Another google search later, I forcefully set the Espressobin link to 100Mbps using ethtool. It works! I have freakin blinking lights. It's a pure dopamine rush!
I add this command to my rc.local to keep between reboots.
ethtool -s wan speed 100 duplex full autoneg on
Yeah, I know I'll need to remove that line from the rc.local when I move back to a 1Gbps link