FWIP - FeatherWeight IP for STM32

I’m working on a project I’m calling FWIP, or FeatherWeight Internet Protocol targeted at the STM32F429 processor on a commercially available Nucleo-F429 dev board. The hope is that when it’s done it will be the simplest thing possible that talks on an IPv4 network and it can go up on GitHub.

And as of today… it pings, which means the basic Tx/Rx channels are working!
ping-small

Some features/goals:

  • Provide a simple, minimalist IPv4 Ethernet stack that can talk online (hardware drivers, ETH, ARP, UDP, TCP, and DHCP, with basic sockets). It’s meant to be a starting framework that you build the rest of your project on top of and modify to support just what you need.

  • I want to document this thing really well once it’s fully up and running. IPv4 is not a complex set of protocols, it’s just tricky to get into because you have to read a bunch of dry RFC specs and pull a bunch of info together from lots of sources. Once you see the big picture of how it all works it’s actually quite simple and elegant. (Stephens “TCP/IP Illustrated, v1” is a great resource)

  • I put all the hardware-specific parts in one file. That should be the only file you need to modify to port it to another processor.

  • It has a pretty good (IMO) trace functionality that can be turned on to log activity out the debug port, or turned off in a way that compiles to zero extra code.

Current status: About 75% done. The hardware drivers, ETH, ARP, UDP and sockets are functional. Starting work on DHCP now. Once that’s done and tested I hope to do a first public release, then go to work on TCP as that will be a big task.

Here’s my project map, blue/green/black are done and about half the grey boxes are working:

I guess this is the first post in this sub-forum, so we don’t really have a set way to do things yet. I’m just gonna live-blog the development in this thread if that’s cool and there is any interest.

2 Likes

These are the kind of libraries these dev kits really need. Stripped down to just the bare bones and not full of bloat.

When you mean port to another processor, what does the processor need? What kind of peripherals does Ethernet require? DMA for streaming or LVDS drivers for the pairs?

The blue boxes in the project map above are external components, in this case an RJ45 jack to plug the Ethernet cable into and an external physical layer, or PHY, chip (LAN8742A) that converts the analog Ethernet signals to a set of digital inputs to the STM32 processor. That PHY chip handles all the LVDS, encoding, and even negotiation with the other end of the connection to work out things like 10/100Mbit speed or full/half duplex. PHY chips do an amazing amount of stuff for a around $1 in quantity.

The green boxes are the on-chip STM32 Ethernet peripheral and that’s what would need to get re-worked for a port. The ETH block is kind of like an Ethernet co-processor. You set it up with a MAC address and it listens to passing traffic until it senses a packet that is addressed to your unit. Then it starts streaming the packet into an internal Rx buffer while checking for data integrity and calculating all the checksums built into IPv4 packets. If all tests pass, the DMA (which you also have to configure) moves the packet data into general SRAM in a pre-determined location. Only when that is done does the system fire off a hardware interrupt (the black ISR box) to let the CPU know it has data.

After the data is in SRAM it’s pretty much a software problem of what to do with it (grey boxes) and that shouldn’t change much with different hardware.

Ok that makes sense then. So you will need those kind of internal hardware in your mcu to port. Sounds like if you where doing ethernet you would make sure to have erm :slight_smile:

I made some progress on the DHCP system. That stands for Dynamic Host Configuration Protocol, and is basically what a computer does when it first connects to the internet to get an internet address, gateway, subnet mask, and all the other info required to talk online.

I got far enough into understanding DHCP that I can hand construct an info request packet, bounce it off an open source DHCP server that I set up, and get a response. Next steps will be setting up a tiny state machine that works through the DHCP protocol, requesting data and acknowledging it as it arrives.

I have a little trace system set up so you can watch the packets zing around, it’s pretty essential for something so interrupt oriented. It lists off the name of each function as they are executed and notes any significant decisions made along the way. For example, the DHCP transmit process currently looks like this:

dhcp::updateCallback
udp::send
packet::writeHeaderlessTxPacket
  pkt = addr:0x20001904  head:0x200014bc  tail:0x2000149c  next:0x00000000
eth::send
arp::send
  ->broadcast
hardware::txPacket
  Tx DMA is suspended, restarting it 
  txPacketISR
    placed one packet on tx DMA chain

ETH_IRQHandler
  NIS
  txPacketISR
    empty queue

dhcp::updateCallback is a dummy routine that gets called by a timer every 10 seconds and sets up a DHCP request structure. It then dumps that data into its UDP port that formats things into IP packet format and passes that down stream to the hardware interface code. That code puts the packet into a Tx queue and notices that the Tx DMA is not active so it starts it up, causing the packet to get sent out the Ethernet port. When the TxDMA finishes sending it it fires off a hardware interrupt (ETH_IRQHandler) that checks to see if any more packets are ready to send. In this case the Tx queue is empty so the TxDMA can go back to sleep.

The receive packet path works pretty much just like this but in reverse!

The good news is that since last week I got a decently working DHCP client up and running, and that will be a good thing to have for the eventual general FWIP release.

The bad news is that for my specific personal project I want to have a bunch of FWIP-STM32 nodes talking to a National Instruments cRIO RTOS Linux machine, but it turns out there isn’t a DCHP server module for that system. My bad for just assuming that would exist.

So how can I get IP addresses assigned for a bunch of FWIP-STM32 nodes? It turns out there’s a standard called IP4 Local Link (RFC3927) that should work pretty well.

It works by having each node power on and then wait one second so everyone has time to boot up, then an additional random time between 0-2 seconds so they don’t all start talking at once.

During that wait time they each make up a random IP address in the range from 169.254.1.0 to 169.254.254.255 inclusive. When it’s their turn to talk they send out a broadcast inquiry to the whole local subnet asking if anyone else is using their freshly made-up address. If nobody has claimed it you announce that you are claiming it and then respond if any later nodes ask if it’s in use. If some other node is already using the address you make up a different one and try again. When the dust settles every node on the local network should have a unique address!

Routers are not supposed to forward any packets in the 169.192.x.x address range since that would obviously cause problems for the internet at large. But my system is isolated and air-gapped from the rest of the world so that should be fine.

I’m going to follow your project closely. I recently am working on a project using Ethernet but my client is using the Wiznet chip where the IP stack is already built in. It’s much harder to find good examples for that arrangement (but there are some around, including from WizNet (C) and the Arduino world (C++).

I want to eventually use the built in Ethernet capability found on many members of the STM32 family.

Keep up the good work.
-Chris

I had to take a bit of a break from this project to work on other stuff but got time to come back to it today.

The work was mainly on the sever side, sending out a command and waiting for the response from the STM32. The image below is looking at an oscilloscope trace of the incoming packet (top two lines) at the STM32 pins vs. the outgoing response (bottom two lines) from the STM32. Just under 19uS, or 3420 ticks of the 180MHz STM32 clock to process the packet and write a response. Some of that time is wasted between the IRQ getting the actual data and the main thread getting around to processing it.

The traces are from the RMII (reduced media independednt interface) pins that take in two bits at a time with a 50MHz clock to match the 100Mbit/s data rate of the Ethernet PHY. They are kind of ringy since the dev board breaks out each pin to a giant header, introducing inductance that wouldn’t be there on a real system with a more point-to-point trace routing.

1 Like