This article is an attempt to the describe main differences between popular protocols for data exchange — FTP and HTTP.
We actively use both protocols in our main product — MyChat enterprise messenger for many years, and we faced a lot of misconceptions about the work of these two fundamental protocols for file transfer over the internet.
If you see some errors, write about them on the forum.
Both protocols are used for uploading and downloading the file on the internet or local networks. For texts and binary data. Both protocols work over TCP/IP. But there is a couple of significant differences between them.
- Transfer speed
- ASCII, EBCDIC, or binary formats
- Pipelined data transmission
- FTP commands/responses
- Two connections
- Firewalls and NAT
- Active and passive modes
- Encrypted controlling connections
- Authorization schemes
- Ranges and restoring downloading
- Persistent connections
- Encoding HTTP chunks
- Virtual hosting based on a name
- Viewing directories
- Proxy support
Perhaps, the most frequently asked question: what protocol transfers files faster — FTP or HTTP?
What makes FTP fast?
- there are no meta descriptions in the transmitted stream, only clean binary data. The helping data transmitted in a separate connection;
- the are no extra expenses when recoding transmitted data.
What makes HTTP fast?
- repeated using of existing constant connections increase TCP productivity; no time wasted for creating new connections;
- faster conveyor processing allows requesting more than one file from the same server;
- (automatic) traffic compression decreases the amount of transmitted data. It can increase the transmission speed when the client and server are fast and the connection channel is slow;
- no controlling commands in the transmission data channel. This saves processing time.
The final result depends on specific details, but I would say that for single-static files you can't feel the difference.
For a single file of small size and slow connection, FTP is a better option. When receiving multiple files in a row (especially small-sized files), HTTP usually shows better results.
FTP (RFC959) showed up to the world 10 years before HTTP was invented. At that time, FTP was the only protocol on the internet. The first signs of what became an RFC959 document can be found in a far 1971.
Both protocols can make it. The FTP has the command "append", HTTP has the approach "here is your data, go figure out what to do with it". In other words, there are no commands for managing uploaded files.
It should be mentioned, that WebDAV protocol exists. It is built over HTTP and allows working with files traditionally as if they are located on your local device.
ASCII, EBCDIC, or binary formats
FTP has an idea about the file format, and it can transfer data both in ASCII and binary formats (raw bytes). HTTP always sends files in a binary format. FTP can convert data "on a fly" if they are transmitted between systems with different architectures (Windows/Linux/mainframes).
For example, if the sender uses one scheme for encoding the end of the line ("EOL" — End-Of-Line), and the receiver uses another, then FTP will make them understand each other. Unix uses only the character NL (newLine x0A), and MS Windows uses two characters in a row, CR and LF (CarriageReturn и LineFeed — x0D0A). EBCDIC recodings used on old mainframes.
Unlike FTP, HTTP provides metadata for files, "Content-Type". Thus, metadata can be used for the content interpretation.
Transferring files via HTTP always includes a set of headers with metadata. FTP never transmits any headers. Thereby, when sending multiple small files, their headers will make a significant part of the traffic. HTTP headers contain the information about the date and time of files modification, characters encoding, server's name/version, etc.
HTTP supports pipelining data. That means, that the client can request new files transfer before the previous one is complete which gives you the ability to remove delays when downloading multiple documents at the same time. TCP packages will be optimized for the maximum transfer speed.
FTP is something similar, but not really. This is the support of multiple requests for parallel files receiving in one controlling connection. Of course, to do this you need to use new TCP connections for transferring binary data, one for each file. However, not all FTP servers support such features.
FTP клиент может отправлять на сервер множество команд и получать на них ответы от сервера. Даже передача одного файла включает в себя целую серию таких простых команд. Это, конечно, негативно сказывается на скорости, потому что каждая команда требует обработки на двух сторонах: клиенте и сервере. Из-за этого возникают задержки. HTTP передачи данных – это преимущественно, только один запрос и один ответ (для каждого файла). Получение одного файла через FTP иногда может занимать до десятка команд и ответов между клиентом и сервером.
One of the most problems for FTP in real work is using two connections. The first connection is for sending controll commands, and the second one is for sending the file content. To do this, FTP opens a separate TCP port every time. If you send 100 files, 100 TCP connections will be opened and closed one by one.
Firewalls and NAT
FTP uses two connections: the controlling one and for data transfer. The data connection can go in two directions and use dynamic numbers for ports. It is very inoconvenient for administrators and often requires from firewall understanding of FTP functioning on the network protocol level to provide a reliable working process.
It also means, if both sides located behind NAT, you most likely won't be able to use FTP.
Besides, NAT kills unoccupied connections that have not been sending data for a long period of time. That is why during long transmissions over FTP we face such a situation when the disconnection happens because NAT decided it was inactive.
To avoid this, you have to send fake empty commands, to keep the connection "alive". The result is small but garbage traffic.
Active and passive modes
FTP opens a second connection in active or passive mode. If the active mode is onЕсли работает активный режим (the server initiates the connection), you may face the problems in complex networks, because such a connection is impossible with NAT. That is why in most cases the passive mode is used when the connection happens only from the client side.
Encrypted controlling connections
As Brandmauers must know how to manage FTP controll connections to allow opening the second connection for transferring binary data, there is a huge problem with encrypted connections (FTP-SSL or FTPS). As soon as controlling connection becomes encrypted, the firewall can't understand its commands to know when and how it should allow connection between the client and server for transferring binary data.
Besides, the development of the FTPS standard took a lot of time that lead to the concurrent existing of several hybrid versions incompatible with each other.
FTP and HTTP have several documented methods of authentication. Both protocols offer basic authentication in plain text (login/password). However, unlike FTP there are few frequently used checking methods for HTTP that do not send the password as plain text.
Both protocols know how to do it. Both protocols had problems when downloading files of the size of more than 2 gigabytes, but this is left in the past. This problem is not relevant in modern clients and servers, and operating systems.
Ranges and restoring downloading
FTP supports downloading and uploading, restoring connections, and transfer in both directions. HTTP can boast of restoring when downloading, but during uploading files on the server, the restoring and further uploading is often impossible.
Unlike FTP, HTTP supports more ranges for download.
Also, FTP has issues when restoring connections during uploading/downloading files starting from the more than 2 GB segment.
HTTP client can handle one persistent connection with the server for any number of file transfers.
FTP must create a new connection for each new transfer. Multiple new connections are bad for performance because of handshakes for TCP connections.
Encoding HTTP chunks
To avoid the closing of connections, when you cannot inform the remote side that the transfer is complete, HTTP was improved with the encoding of transmitted blocks (chunks) with data.
During the transfer, the sending side returns the data stream in blocks (block size + data) until they run out, and then transmits the block with a zero-length to indicate about the file ending.
Moreover, there is no need to open/close the connection for new files. Another advantage is the ability to detect premature emergency disconnections during the transfer.
FTP offers the official built-in RLE compression, however, it is usually ineffective for most binary and text data. There are a lot of additional "hacker-like" solutions for compressing FTP traffic but none of them became official and useful.
FTP supports the technology for transferring the data from one server to another as if the transmission is performed by the client itself. However, on most servers, this ability is closed due to security measures, because the FXP protocol was poorly designed.
Both HTTP and FTP work perfectly with IPv6. However, the original specification of the FTP protocol had no support for IPv6, and a lot of servers still do not have commands for its start. It also deals with gateways between clients and servers that must understand FTP.
Virtual hosting based on a name
By using HTTP 1.1, you can easily place a bunch of websites on one server and all of them will differ by names.
In FTP you can't use the virtual hosting based on names until the command HOST performed on the server to which you connected. This is a new specification and it is not common yet.
FTP can get a list of files from the folder on the remote server without downloading them. HTTP does not have such an ability.
However, considering that the authors of FTP specification lived in a different time, the commands for getting the list of files in the directory (LIST and NLST) do not have a clear description of the output format. That is why FTP clients' authors have to write text parsers to guess what data is sent by the server. Later specifications (RFC3659) provide new commands like MLSD, but they still have not got widespread and poorly supported by various servers and clients.
Lists of files in directories via HTTP are usually transmitted as plain text, HTML format, or with the help of WebDAV, which works over HTTP.
One of the most serious HTTP advantages comparing to FTP is proxy support that was built in HTTP from the very beginning. The technology is debugged and works just fine. A lot of protocols can be encapsulated in HTTP as a so-called converter for passing proxy servers.
FTP always used with proxy servers, but it was never standardized and required special approaches in each specific case.