Linux kernel TCP smoothed-RTT estimation

Recently I decided to look under the hood to see how exactly srtt is calculated in Linux. Actual (Exponentially Weighted Moving Average) srtt calculation is a rather straight-forward part but what goes in as input to that calculation under various scenarios is interesting and very important in getting correct rtt estimate.

Also useful to note the difference between Linux and FreeBSD in this regard. Linux doesn’t trust tcp packet Timestamps option provided value whenever possible as middle-boxes can meddle with it.

Basic algorithm is:
For non-retransmitted packets, use saved packet send timestamp and ack arrival time.
For retransmitted packets, use timestamp option and if that’s not enabled, rtt is not calculated for such packets.

Let’s look at the code. I am using net-next.
When a TCP sender sends packets, it has to wait for acks for those packets before throwing them away. It stores them in a queue called ‘retransmission queue’.
When sent packets get acked, tcp_clean_rtx_queue() gets called to clear those packets from the retransmission queue.

A few useful variables in that function are:
seq_rtt_us – uses first packet from ackd range
ca_rtt_us – uses last packet from ackd range (mainly used for congestion control)
sack_rtt_us – uses sacked ack
tcp_mstamp is a tcp_sock member which represents timestamp of most recent packet received/sent. It gets updated by tcp_mstamp_refresh().

For a clean ack (not sack), seq_rtt_us = ca_rtt_us (as there is no range)

If such a clean is also for a non-retransmitted packet,

seq_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, first_ackt);

and for a sack which is again for a non-retransmitted packet,

sack_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, sack->first_sackt);

Code that updates sack→first_sackt is in tcp_sacktag_one() where it gets populated when the sack is for a non-retransmitted packet.

tcp_stamp_us_delta() gets the difference with timestamp that the stack maintains.

Now tcp_ack_update_rtt() gets called which starts out with:

/* Prefer RTT measured from ACK's timing to TS-ECR. This is because
 * broken middle-boxes or peers may corrupt TS-ECR fields. But
 * Karn's algorithm forbids taking RTT if some retransmitted data
 * is acked (RFC6298).
 */
if (seq_rtt_us < 0)
        seq_rtt_us = sack_rtt_us;

For acks acking retransmitted packets, seq_rtt_us would be -ve.
But if there is a SACK timestamp from a non-retransmitted packet, it would use that as it carries valid and useful timestamps.

Then it takes TS-opt provided timestamps only if seq_rtt_us is -ve.

if (seq_rtt_us < 0 && tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
    flag & FLAG_ACKED) {
        u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
        u32 delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
 
        seq_rtt_us = ca_rtt_us = delta_us;
}

By this point, there is seq_rtt_us that can be fed into tcp_rtt_estimator() that’d generate smoothed-RTT (which is more or less based on SIGCOMM 88 paper by Van Jacobson).

FreeBSD Project to participate in Google Summer of Code 2018

The FreeBSD Project is pleased to announce its participation in Google's 2018 Summer of Code program, which funds summer students to participate in open source projects. This will be the FreeBSD Project's fourteenth year in the program, having mentored over 210 successful students through summer-long coding projects between 2005 and 2017.

FOSDEM 2018 | BSD Now 232

We talk about our recent trip to FOSDEM, we discuss the pros & cons of permissive licensing, cover the installation of OpenBSD on a dedibox with full-disk encryption, the new Lumina guide repository & we explain ZFS vs. OpenZFS.

FOSDEM 2018 | BSD Now 232

We talk about our recent trip to FOSDEM, we discuss the pros & cons of permissive licensing, cover the installation of OpenBSD on a dedibox with full-disk encryption, the new Lumina guide repository & we explain ZFS vs. OpenZFS.

Reponse zones in BIND (RPZ/Blocking unwanted traffic).

A while ago, my dear colleague Mattijs came with an interesting option in BIND. Response zones. One can create custom "zones" and enforce a policy on that.

I never worked with it before, so I had no clue at all what to expect from it. Mattijs told me how to configure it (see below for an example) and offered to slave his RPZ policy-domains.

All of a sudden I was no longer getting a lot of ADS/SPAM and other things. It was filtered. Wow!

His RPZ zones were custom made and based on PiHole, where PiHole adds hosts to the local "hosts" file and sends it to 127.0.0.1 (your local machine), which prevents it to reach the actual server at all, RPZ policies are much stronger and more dynamic.

RPZ policies offer the use of "redirecting" queries. What do I mean with that? well you can force a ADVERTISEMENT (AD for short) site / domain to the RPZ policy and return a NXDOMAIN. It no longer exists for the end-user. But you can also CNAME it to a domain/host you own and then add a webserver to that host and tell the user query'ing the page: "The site you are trying to reach had been pro-actively blocked by the DNS software. This is an automated action and an automated response. If you feel that this is not appropriate, please let us know on <mail link>", or something like that.

Once I noticed that and saw the value, I immediately saw the benefit for companies and most likely schools and home people. Mattijs had a busy time at work and I was recovering from health issues, so I had "plenty" of time to investigate and read on this. The RPZ policies where not updated a lot and caused some problems for my ereaders for example (msftcncsi.com was used by them, see another post on this website for being grumpy about that). And I wanted to learn more about it. So what did I do?

Yes, I wrote my own parser. In perl. I wrote a "rpz-generator" (its actually called like that). I added the sources Mattijs used and generated my own files. They are rather huge, since I blocked ads, malware, fraud, exploits, windows stuff and various other things (gambling, fakenews, and stuff like that).

I also included some whitelists, because msfctinc was added to the lists and it made my ereaders go beserk, and we play a few games here and there which uses some advertisement sites, so we wanted to exempt them as well. It's better to know which ones they are and selectively allow them, then having traffic to every data collector out there.

This works rather well. I do not get a lot of complaints that things are not working. I do see a lot of queries going to "banned" sites everyday. So it is doing something .The most obvious one is that search results on google, not always are clickable. The ones that have those [ADV] sites, are blocked because they are advertising google sponsored sites, and they are on the list.. and google-analytics etc. It doesn't cause much harm to our internet surfing or use experience, with the exception of the ADV sites I just mentioned. My wife sometimes wants to click on those because she searches for something that happends to be on that list, but apart from that we are doing just fine.

One thing though, I wrote my setup and this article with my setup using "NXDOMAIN" which just gives back "site does not exist" messages. I want to make my script more smart by making it a selectable, so that some categories are CNAMED to a filtering domain and webpage, and some are NXDOMAIN'ed. If someone has experience with that, please show me some idea's and how that looks like and whether your end-users can do something with it or not. I think schools will be happy to present a block-page instead of NXdomain'ing some sites 🙂

Acknowledgements: Mattijs for teaching and showing me RPZ, ISC for placing RPZ in NAMED, and zytrax.com for having such excellent documentation to RPZ. The perl developers for having such a great tool around, and the various sites I use to get the blocklists from. Thank you all!

If you want to know more about the tool, please contact me and we can share whatever information is available 🙂

BSDCan 2018

BSDCan 2018 (https://www.bsdcan.org/2018/), University of Ottawa, Ottawa, Canada 06 - 09 June, 2018. A four day BSD conference held in Ottawa, Canada. BSDCan hosts talks and tutorials on a range of topics based around the BSD family of operating systems.

About the Meltdown and Spectre attacks

About the Meltdown and Spectre attacks: FreeBSD was made aware of the problems in late December 2017. We're working with CPU vendors and the published papers on these attacks to mitigate them on FreeBSD. Due to the fundamental nature of the attacks, no estimate is yet available for the publication date of patches.

AsiaBSDCon 2018

AsiaBSDCon 2018 (https://2018.asiabsdcon.org/), Tokyo University of Science, Tokyo, Japan 08 - 11 March, 2018. AsiaBSDCon is a conference for users and developers on BSD based systems. The conference is for anyone developing, deploying and using systems based on FreeBSD, NetBSD, OpenBSD, DragonFlyBSD, Darwin and MacOS X. AsiaBSDCon is a technical conference and aims to collect the best technical papers and presentations available to ensure that the latest developments in our open source community are shared with the widest possible audience.

FOSDEM 2018

FOSDEM 2018 (https://fosdem.org/2018/), Free University of Brussels, Solbosch Campus, Brussels, Belgium 03 - 04 February, 2018. FOSDEM is an event centered on free and open-source software development. As with many other communities, the BSDs feature an alloted room for giving talks at the event. See schedule on the FOSDEM website for the talks held in the BSD Developer room. The FreeBSD Foundation will also be present to share information about the project.