The program listed here creates a Socket and connects it to a remote computer.
It's assumed that we connect to a web-server at the other end. Then when/if the
socket is connected it reads what the remote machine outputs over the socket
and prints it to the screen. Then the program exits.
Program listing:
1 #include <stdio.h>
2 #include <string.h>
3 #include <stdlib.h>
4 #include <unistd.h>
5 #include <fcntl.h>
6 #include <netinet/tcp.h>
7 #include <sys/socket.h>
8 #include <sys/types.h>
9 #include <netinet/in.h>
10 #include <netdb.h>
11 int socket_connect(char *host, in_port_t port){
12 struct hostent *hp;
13 struct sockaddr_in addr;
14 int on = 1, sock;
15 if((hp = gethostbyname(host)) == NULL){
16 herror("gethostbyname");
17 exit(1);
18 }
19 bcopy(hp->h_addr, &addr.sin_addr, hp->h_length);
20 addr.sin_port = htons(port);
21 addr.sin_family = AF_INET;
22 sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
23 setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, (const char *)&on, sizeof(int));
24 if(sock == -1){
25 perror("setsockopt");
26 exit(1);
27 }
28 if(connect(sock, (struct sockaddr *)&addr, sizeof(struct sockaddr_in)) == -1){
29 perror("connect");
30 exit(1);
31 }
32 return sock;
33 }
34 #define BUFFER_SIZE 1024
35 int main(int argc, char *argv[]){
36 int fd;
37 char buffer[BUFFER_SIZE];
38 if(argc < 3){
39 fprintf(stderr, "Usage: %s <hostname> <port>\n", argv[0]);
40 exit(1);
41 }
42 fd = socket_connect(argv[1], atoi(argv[2]));
43 write(fd, "GET /\r\n", strlen("GET /\r\n"));
44 bzero(buffer, BUFFER_SIZE);
45 while(read(fd, buffer, BUFFER_SIZE - 1) != 0){
46 fprintf(stderr, "%s", buffer);
47 bzero(buffer, BUFFER_SIZE);
48 }
49 shutdown(fd, SHUT_RDWR);
50 close(fd);
51 return 0;
52 }
We first state that the line numbers in the beginning of all lines are just
there make it easier to go through the program. If you intend to compile this
program remove all the line numbers first.
To compile the program you need a C Compiler like GCC. In order to do so you
type the following in your shell, provided that you saved the code in a file
called connect_socket.c
$ gcc -o connect_socket connect_socket.c
This will produce a program called connect_socket, you can try it using:
$ ./connect_socket linuxdocs.org 80
Web-servers typically listens to port 80 so if there's any web-server at
linuxdocs.org then you would get the index page.
Now to the program, what does it do? First we start at line 35. This is where
all C programs starts (this is not always true, but in most cases at least).
This is a function called main that takes two arguments. int argc and char
*argv[]. argc is the number of arguments to the program. Arguments are the ones
you provided from the command line, for example "linuxdocs.org" and "80" in the
case above. These arguments are stored in *argv[], which is of the type pointer
to char arrays. The program name is always stored in the first argument so if
you would want to write out the programs name you can try:
pritnf("Hello I'm a program called: %s\n", argv[0]);
Further at line 36 we declare a file descriptor called fd. We will use it later
to read from the socket we are about to create. Also, a char array named buffer
is declared to store the incoming data in, more about that later. We see that
we want the array to be of size BUFFER_SIZE. Previously we declared BUFFER_SIZE
to be 1024, that is on line 34. The #define statement is a pre processor
directive. For now we just state that: at ever place where we use the word
BUFFER_SIZE we get the value 1024.
At line 38 we check that the user have supplied the program with the correct
number of arguments.
38 if(argc < 3){
39 fprintf(stderr, "Usage: %s <hostname> <port>\n", argv[0]);
40 exit(1);
41 }
That is, argc < 3, recall that the array called argv stores three arguments.
First the name of the program, then the user supplied. So we are really
interested in argv[1] and argv[2]. If the user fails to provide enough
arguments we print some error message at line 39 and exits the program at line
40. What is this stderr you can see on line 39, the first argument to fprintf.
There are three standard file streams that most operating systems provide,
stdin, stdout and stderr. stderr in this case is an un buffered output.
Everything written to this filestream is printed directly to the terminal.
stdout however often wait for a while. If the program terminates in some bad
way, like a segmentation fault it's not guarantied that things written to
stdout will be displayed, therefor it's often the case that error messages are
written to stderr so that we are guarantied to see then. If you'r interested in
fprintf you should check it's man page. man fprintf. Please note that it's
often considered good conduct to check the arguments provided to a program,
can't really hurt.
On line 42 we call the function socket connect. It look like this:
42 fd = socket_connect(argv[1], atoi(argv[2]));
The first argument provided to the program is the host we wish to connect to,
we simply pass that argument to the function. The second argument is the port.
Here, however we get a string i.e. "80" but would like to convert it to an
integer. This is done by the function atoi(). atoi() takes as argument a string
and tries to convert it to an integer. atoi is not guarantied to return
something senceful, if you would provide it with "hello" the returned value is
somewhat undefined, what you actually get is depending on the actually
implementation of atoi. Just be careful so that you don't assume anything about
the arguments to atoi.
Now back to line 11 where the function socket_connect is declared. The first
part looks this:
11 int socket_connect(char *host, in_port_t port){
12 struct hostent *hp;
13 struct sockaddr_in addr;
14 int on = 1, sock;
We have here declared a variable *hp of type struct hostent. This variable is
used later when we try to figure out the host address associated with the
hostname that we provided the program. We'll look at a bit later. addr is a
variable of type struct sockaddr_in. This variable is used later when we open
the connection to the remote host. Further we use a variable called on, that
helps us later. And last a variable called sock, this is the actuall file
descriptor that we will associate the opened socket with later on.
Now we come to the part where we actually try to resolve the host address
accosiated with the host name. This is done on line 15.
15 if((hp = gethostbyname(host)) == NULL){
16 herror("gethostbyname");
17 exit(1);
18 }
The function called gethostbyname takes a char* as argument which might be
something like "www.google.com" or "192.168.0.1". It returns a pointer to a
struct of type struct hostent. We check that we get something senseful out of
it, i.e. is the pointer was assigned the value NULL we have something of an
error. Exactly what happened is unknown, but we assume that the function herror
can tell us. Therefor we call herror with an argument "gethostbyname". For
example we might try to lookup a hostname that does not exist.
$ ./socket_connect www.hshsasjdhas.dfhsaj 80
gethostbyname: Unknown host
manpages for gethostbyname will get you additional useful information.
On line 19 we take the result from gethostbyname, that is hp, and use a part of
the struct called h_addr. This part contains the IP number to the host.
Typically encoded as 4 byts. This is not always the case so rather than
assuming anything about the length we use hp->h_length, a variable that
indicates exactly the length of the IP. You should check, again, the manpages
for gethostbyname if you're interested in what the struct contains. Anyway, we
use bcopy to copy the address to the part of addr called sin_addr. As we can
see here addr is stack allocated, which means that we have to use a pointer to
the struct rather than the struct itself. That is done by using the & operator.
This might be very confusing at first, but rather than covering all the details
here we just state that you have to do like that. Anyway, now the address is
copied into addr.
19 bcopy(hp->h_addr, &addr.sin_addr, hp->h_length);
What about the port? Internet connected machines typically can listen to at
most 65536 different ports. But mostly they listen to just a few of them. We
might decide that the typical webserver listens to port 80, so that is what we
try. Now comes an interesting function called htons.
20 addr.sin_port = htons(port);
Some architectures like SPARC use something called BIG ENDIAN byte order, and
some like Intel and clones uses LITTLE ENDIAN byte order. What is the
difference? Everything has to do with the way they have chosen to encode bits
in an integer. As an example we assume a 32 bit integer then we can think of
the interger as being built out of 4 bytes. one byte = 8 bits, thus 8*4 = 32.
Something like this A,B,C,D. Where A corresponds to the first byte, B the
second and so on. If A,B,C,D is the case with BIG ENDIAN, then LITTLE ENDIAN
encodes it like this: D,C,B,A. Alright, so they have different ways to encode
the same number. What's kind of interesting here is that it's not simply a
revere order of the bits, but rather the bytes. However, if we wish to send
binary data from one machine to another it might be very useful to know how
the interpreter and encode integers. And now we are going to send a package
over the Internet to a host of unknown architecture. We better take some
precautions. To deal with this matter it's decided that Internet is BIG ENDIAN
byte order. Simple as that. The htons function which is short for
host-to-network, change the byte order if necessary. A way to check what
byteoredr your machine has is to run the following test:
printf("%d\n", htons(666));
If it prints 39462 you'r on a machine that uses LITTLE ENDIAN and if it prints
666 you'r on a BIG ENDIAN machine. Continuing with line 21 we simply tell the
addr struct that we are interested in the Address Family InterNET, AF_INET.
21 addr.sin_family = AF_INET;
When this is done we create a socket, as told by the sock manpage (try
man -S 2 socket if get nothing, or unrelated info) we simply creates a
communication endpoint. This socket is not connected to anything yet, but we
specify some interesting attributes for the socket. First we use something
called PF_INET which specifies which protocol we what to use. PF_INET
corresponds to Protocol Family IPV4. You could for instance use PF_INET6 which
corresponds to IPV6 or PF_IPX which is the Novel protocol, and so on. Then we
tell the socket function that we are interested in using SOCK_STREAM, this
argument corresponds to the type of communication. SOCK_STREAM typically
corresponds to two way reliable communications. You could for example use
SOCK_DGRAM here if you want to send datagram packages. Last we specify that we
are interested in an TCP connection by giving IPPROTO_TCP as argument. Again,
check the socket manpage for more details.
22 sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
More options, this is getting more and more complex. :)=. We use setsockopt to
tell set some options for the socket. first we tell it to use IPPROTO_TCP,
again. Then we specify some options for this protocol, namely that we are
interested in no delay communications using TCP_NODELAY. If you recall the
variable on before, we did set it to 1. When sending it to the function
setsockopt it means that we are interested in enabling TCP_NODELAY rather then
disabling it, 0 would do that. Interesting enough we send a pointer, recall
that & gets the address to a variable, in this case on. We also tell how large
this variable is by sending in the last argument sizeof(int). setsockopt is a
quiet useful function that can manipulate a lot of properties that sockets
have. Check out the manpage for setsockopt to get more details.
23 setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, (const char *)&on, sizeof(int));
24 if(sock == -1){
25 perror("setsockopt");
26 exit(1);
27 }
If this option manipulation for some reason fails, maybe because some option we
try to enable is not available for the type of communication we want to use
then we get the return value -1. This is checked at line 24 to 27. The general
idea is the same as with gethostbyname before, but we use another error
function here. Check manpages for perror if you're interested.
Now at last, we are ready to connect out socket to a remote machine. The
function connect does this. We use the sock variable that we have done a lot of
things with. Also we use the addr variable which have some information about
where we wish to connect. Observe that we cast the addr variable to a pointer
of type struct sockaddr. Again we use the & operator to get the address of the
struct. We also tell how large this struct is by using sizeof(struct
sockaddr_in). Then we check the reurn value, if -1 we have problems. For
instance we might want to connect to a port on a machine that didn't listen.
For example:
$ ./connect_socket localhost 6677 connect: Connection refused
28 if(connect(sock, (struct sockaddr *)&addr, sizeof(struct sockaddr_in)) == -1){
29 perror("connect");
30 exit(1);
31 }
32 return sock;
Error handling here is very similar to the above examples. Now we return the
just created socket at like 32. Back to the main function, we want to read
things from this socket also.
But how do we make the web-server at the other end send us anything? Well
luckily the procedure is very simple. Sending "GET /\r\n" to a web-server just
tells the server that we want the root of the server, this often defaults to
the index page. The "\r\n" is just a standard way of telling the server at the
other end that we won't send anything else on the same line, so it's safe to
interpreter the line as is. The function write does this for ur. It takes 3
parameters, the file-descriptor fd, that is the socket. Furthermore, the string
we want to send, that is "GET /\n\r", and last the length of that string.
43 write(fd, "GET /\r\n", strlen("GET /\r\n"));
After that we take the buffer we declared before and set all bytes in this
buffer to 0. This is to avoid junk data that the buffer might contain.
44 bzero(buffer, BUFFER_SIZE);
Then while read indicates that there are still things to read we read from the
socket. read returns the number of bytes that have been read. We simply assume
that if we get the result 0 bytes read then we have read all available data.
This is generally true when we work with blocking IO. That is the read function
waits till it can read something, something that is good since it might take
some time for the data to travel over the Internet. Arguments to read are the
file-descriptor fd, i.e. the one we are reading from. The char array buffer in
which we store the data. And lastly the number of bytes we want to read every
time. But why not read exactly BUFFER_SIZE bytes? we just read BUFFER_SIZE - 1
bytes. This is because the last byte in this char array is 0, due to the call
to bzero before. When we print the contents of the buffer using fprintf on line
46 fprintf must know when to stop printing. The case is that fprintf stops
printing when it sees 0, or '\0' if you want the char value for 0. Otherwise
we would print other things in memory that comes after the buffer. Something
that might end up with an Segmentation fault, when trying to read memory we
have no access to. After we have printed the message we bzero the buffer again
and continue until no data is left to print.
45 while(read(fd, buffer, BUFFER_SIZE - 1) != 0){
46 fprintf(stderr, "%s", buffer);
47 bzero(buffer, BUFFER_SIZE);
48 }
When we're done we close the socket using shutdown, and specify that we are not
interested in reading (RD) nor writing (WR) using SHUT_RDWR. After that we
close the file descriptor using close and return 0 to the shell, just for the
sake of good conduct.
49 shutdown(fd, SHUT_RDWR);
50 close(fd);
51 return 0;
That's it about it, quiet frankly this example might be a bit hard to begin
with since it's lengthy and contains a lot of socket yadda yadda. But I assume
that most people would want something more 'useful' than another hello world
described in great detail.