Overview
tcpreplay has evolved quite a bit over the years. In the 1.x days, it merely read packets and sent then back on the wire. In 2.x, tcpreplay was enhanced significantly to add various rewriting functionality but at the cost of complexity, performance and bloat. Now in 3.x, tcpreplay has returned to its roots to be a lean packet sending machine and the editing functions have moved to tcprewrite.
Basic Usage
To replay a given pcap as it was captured all you need to do is specify the pcap file and the interface to send the traffic out interface 'eth0':
# tcpreplay --intf1=eth0 sample.pcap
Replaying at different speeds
You can also replay the traffic at different speeds then it was originally captured. Some examples:
To replay traffic as quickly as possible:
# tcpreplay --topspeed --intf1=eth0 sample.pcap
To replay traffic at a rate of 10Mbps:
# tcpreplay --mbps=10.0 --intf1=eth0 sample.pcap
To replay traffic 7.3 times as fast as it was captured:
# tcpreplay --multiplier=7.3 --intf1=eth0 sample.pcap
To replay traffic at half-speed:
# tcpreplay --multiplier=0.5 --intf1=eth0 sample.pcap
To replay at 25 packets per second:
# tcpreplay --pps=25 --intf1=eth0 sample.pcap
To replay packets, one at a time while decoding it (useful for debugging purposes):
# tcpreplay --oneatatime --verbose --intf1=eth0 sample.pcap
Replaying files multiple times
Using the loop flag you can specify that a pcap file will be sent two or more times:
To replay the sample.pcap file 10 times:
# tcpreplay --loop=10 --intf1=eth0 sample.pcap
To replay the sample.pcap an infinitely or until CTRL-C is pressed:
# tcpreplay --loop=0 --intf1=eth0 sample.pcap
If the pcap files you are looping are small enough to fit in available RAM, consider using the --enable-file-cache option. This option caches each packet in RAM so that subsequent reads don't have to hit the slower disk. It does have a slight performance hit for the first iteration of the loop since it has to call malloc() for each packet, but after that it seems to improve performance by around 5-10%. Of course if you don't have enough free RAM, then this will cause your system to swap which will dramatically decrease performance.
Another useful option is --quiet. This suppresses printing out to the screen each time tcpreplay starts a new iteration. This can have a dramatic performance boost for systems with slower consoles.
Advanced Usage
Splitting Traffic Between Two Interfaces
By utilizing tcpprep cache files, tcpreplay can split traffic between two interfaces. This allows tcpreplay to send traffic through a device and emulate both client and server sides of the connection, thereby maintaining state. Using a tcpprep cache file to split traffic between two interfaces (eth0 & eth1) with tcpreplay is simple:
# tcpreplay --cachefile=sample.prep --intf1=eth0 --intf2=eth1 sample.pcap
Viewing Packets as They are Sent
The --verbose flag turns on basic tcpdump decoding of packets. If you would like to alter the way tcpreplay invokes tcpdump to decode packets, then you can use the --decode flag. Note: Use of the --verbose flag is not recommended when performance is important. Please see the tcpdump(1) man page for options to pass to the --decode flag.
Choosing a Timing Method
tcpreplay now supports two methods for creating delays between two packets:
- nanosleep() (default)
- gettimeofday()
The important thing to understand is that nanosleep() isn't always very accurate. Linux 2.4 and 2.6 kernels for example are accurate to 10ms, hence any packet may be sent 10ms too soon or late. The result is that nanosleep() may not provide the accuracy reqiured for all situations. Specifying the --accurate flag to tcpreplay switches to using gettimeofday() in a loop. The result is much better accuracy (~1ms), but higher CPU utilization since tcpreplay isn't sleeping between packets.
Of course neither method is really sufficient for all situations. If you wanted to send 4,000 packets per second, that would require sending a packet every .25ms which would require a higher resolution timer then either method provides. See ticket #41 for more details.
Tuning for High-Performance
Regardless of the size of physical memory, UNIX kernels will only allocate a static amount for network buffers. This includes packets sent via the "raw" interface, like with tcpreplay. Most kernels will allow you to tweak the size of these buffers, drastically increasing performance and accuracy.
NOTE: The following information is provided based upon my own experiences or the reported experiences of others. Depending on your hardware and specific hardware, it may or may not work for you. It may even make your system horribly unstable, corrupt your harddrive, or worse.
NOTE: Different operating systems, network card drivers, and even hardware can have an effect on the accuracy of packet timestamps that tcpdump or other capture utilities generate. And as you know: garbage in, garbage out.
NOTE: If you have information on tuning the kernel of an operating system not listed here, please send it to me so I can include it.
General Tips
- Use a good network card. This is probably the most important buying decision you can make. I recommend Intel e1000 series cards. El-cheapo cards like Realtek are known to give really crappy performance.
- Tune your OS. See below for recommendations.
- Faster is better. If you want really high-performance, make sure your disk I/O, CPU and the like is up to the task.
- For more details, check out the FAQ
- If you're looping file(s), make sure you have enough free RAM for the pcap file(s) and use --enable-file-cache
- Use --quiet
- Do not use ./configure --enable-tcpreplay-edit
Linux 2.4.x
The following is known to apply to the 2.4.x series of kernels and may work with 2.6.x (I haven't bothered to try yet). If anyone has any information regarding other kernel versions, please let me know. By default Linux's tcpreplay performance isn't all that stellar. However, with a simple tweak, relatively decent performance can be had on the right hardware. By default, Linux specifies a 64K buffer for sending packets. Increasing this buffer to about half a megabyte does a good job:
echo 524287 >/proc/sys/net/core/wmem_default echo 524287 >/proc/sys/net/core/wmem_max echo 524287 >/proc/sys/net/core/rmem_max echo 524287 >/proc/sys/net/core/rmem_default
On one system, we've seen a jump from 23.02 megabits/sec (5560 packets/sec) to 220.30 megabits/sec (53212 packets/sec) which is nearly a 10x increase in performance. Depending on your system and capture file, different numbers may provide different results.
*BSD
*BSD systems typically allow you to specify the size of network buffers with the NMBCLUSTERS option in the kernel config file. Experiment with different sizes to see which yields the best performance. See the options(4) man page for more details.