Recently I started taking this guide to get myself started on downloading files from the internet. I read it and came up with the following code to download the HTTP body of a website. The only problem is, it's not working. The code stops when calling the recv() call. It does not crash, it just keeps on running. Is this my fault? Am I using the wrong approch? I intent to use the code to not just download the contents of .html-files, but also to download other files (zip, png, jpg, dmg ...). I hope there's somebody that can help me. This is my code:
#include <stdio.h>
#include <sys/socket.h> /* SOCKET */
#include <netdb.h> /* struct addrinfo */
#include <stdlib.h> /* exit() */
#include <string.h> /* memset() */
#include <errno.h> /* errno */
#include <unistd.h> /* close() */
#include <arpa/inet.h> /* IP Conversion */
#include <stdarg.h> /* va_list */
#define SERVERNAME "developerief2.site11.com"
#define PROTOCOL "80"
#define MAXDATASIZE 1024*1024
void errorOut(int status, const char *format, ...);
void *get_in_addr(struct sockaddr *sa);
int main (int argc, const char * argv[]) {
int status;
// GET ADDRESS INFO
struct addrinfo *infos;
struct addrinfo hints;
// fill hints
memset(&hints, 0, sizeof(hints));
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = AI_PASSIVE;
hints.ai_family = AF_UNSPEC;
// get address info
status = getaddrinfo(SERVERNAME,
PROTOCOL,
&hints,
&infos);
if(status != 0)
errorOut(-1, "Couldn't get addres information: %s\n", gai_strerror(status));
// MAKE SOCKET
int sockfd;
// loop, use first valid
struct addrinfo *p;
for(p = infos; p != NULL; p = p->ai_next) {
// CREATE SOCKET
sockfd = socket(p->ai_family,
p->ai_socktype,
p->ai_protocol);
if(sockfd == -1)
continue;
// TRY TO CONNECT
status = connect(sockfd,
p->ai_addr,
p->ai_addrlen);
if(status == -1) {
close(sockfd);
continue;
}
break;
}
if(p == NULL) {
fprintf(stderr, "Failed to connect\n");
return 1;
}
// LET USER KNOW
char printableIP[INET6_ADDRSTRLEN];
inet_ntop(p->ai_family,
get_in_addr((struct sockaddr *)p->ai_addr),
printableIP,
sizeof(printableIP));
printf("Connection to %s\n", printableIP);
// GET RID OF INFOS
freeaddrinfo(infos);
// RECEIVE DATA
ssize_t receivedBytes;
char buf[MAXDATASIZE];
printf("Start receiving\n");
receivedBytes = recv(sockfd,
buf,
MAXDATASIZE-1,
0);
printf("Received %d bytes\n", (int)receivedBytes);
if(receivedBytes == -1)
errorOut(1, "Error while receiving\n");
// null terminate
buf[receivedBytes] = '\0';
// PRINT
printf("Received Data:\n\n%s\n", buf);
// CLOSE
close(sockfd);
return 0;
}
void *get_in_addr(struct sockaddr *sa) {
// IP4
if(sa->sa_family == AF_INET)
return &(((struct sockaddr_in *) sa)->sin_addr);
return &(((struct sockaddr_in6 *) sa)->sin6_addr);
}
void errorOut(int status, const char *format, ...) {
va_list args;
va_start(args, format);
vfprintf(stderr, format, args);
va_end(args);
exit(status);
}
If you want to grab files using HTTP, then libcURL is probably your best bet in C. However, if you are using this as a way to learn network programming, then you are going to have to learn a bit more about HTTP before you can retrieve a file.
What you are seeing in your current program is that you need to send an explicit request for the file before you can retrieve it. I would start by reading through RFC2616. Don't try to understand it all - it is a lot to read for this example. Read the first section to get an understanding of how HTTP works, then read sections 4, 5, and 6 to understand the basic message format.
Here is an example of what an HTTP request for the stackoverflow Questions page looks like:
GET http://stackoverflow.com/questions HTTP/1.1\r\n
Host: stackoverflow.com:80\r\n
Connection: close\r\n
Accept-Encoding: identity, *;q=0\r\n
\r\n
I believe that is a minimal request. I added the CRLFs explicitly to show that a blank line is used to terminate the request header block as described in RFC2616. If you leave out the Accept-Encoding
header, then the result document will probably be transfered as a gzip-compressed stream since HTTP allows for this explicitly unless you tell the server that you do not want it.
The server response also contains HTTP headers for the meta-data describing the response. Here is an example of a response from the previous request:
HTTP/1.1 200 OK\r\n
Server: nginx\r\n
Date: Sun, 01 Aug 2010 13:54:56 GMT\r\n
Content-Type: text/html; charset=utf-8\r\n
Connection: close\r\n
Cache-Control: private\r\n
Content-Length: 49731\r\n
\r\n
\r\n
\r\n
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" ... 49,667 bytes follow
This simple example should give you an idea what you are getting into implementing if you want to grab files using HTTP. This is the best case, most simple example. This isn't something that I would undertake lightly, but it is probably the best way to learn and appreciate HTTP.
If you are looking for a simple way to learn network programming, this is a decent way to start. I would recommend picking up a copy of TCP/IP Illustrated, Volume 1 and UNIX Network Programming, Volume 1. These are probably the best way to really learn how to write network-based applications. I would probably start by writing an FTP client since FTP is a much simpler protocol to start with.
If you are trying to learn the details associated with HTTP, then:
telnet server 80
and typing in requests by hand--verbose
and --include
command line options so that you can see what is happeningJust don't plan on writing your own HTTP client for enterprise use. You do not want to do that, trust me as one who has been maintaining such a mistake for a little while now...