Subscribe For Free Updates!

We'll not spam mate! We promise.

Thursday, May 23, 2013

HTTP Protocol

HTTP stands for Hypertext Transfer Protocol. It is an TCP/IP based communication protocol which is used to deliver virtually all files and other data, collectively called resources, on the World Wide Web. These resources could be HTML files, image files, query results, or anything else.
A browser is works as an HTTP client because it sends requests to an HTTP server which is called Web server. The Web Server then sends responses back to the client. The standard and default port for HTTP servers to listen on is 80 but it can be changed to any other port like 8080 etc.
There are three important things about HTTP of which you should be aware:

  • HTTP is connectionless: After a request is made, the client disconnects from the server and waits for a response. The server must re-establish the connection after it process the request.
  • HTTP is media independent: Any type of data can be sent by HTTP as long as both the client and server know how to handle the data content. How content is handled is determined by the MIME specification.
  • HTTP is stateless: This is a direct result of HTTP's being connectionless. The server and client are aware of each other only during a request. Afterwards, each forgets the other. For this reason neither the client nor the browser can retain information between different request across the web pages.
Following diagram shows where HTTP Protocol fits in communication:









Like most network protocols, HTTP uses the client-server model: An HTTP client opens a connection and sends a request message to an HTTP server; the server then returns a response message, usually containing the resource that was requested. After delivering the response, the server closes the connection.
The format of the request and response messages are similar and will have following structure:
  • An initial line CRLF
  • Zero or more header lines CRLF
  • A blank line ie. a CRLF
  • An optional message body like file, query data or query output.
Initial lines and headers should end in CRLF. Though you should gracefully handle lines ending in just LF. More exactly, CR and LF here mean ASCII values 13 and 10.

Initial Line : Request

The initial line is different for the request than for the response. A request line has three parts, separated by spaces:
  • An HTTP Method Name
  • The local path of the requested resource.
  • The version of HTTP being used.
Here is an exampple of initial line for Request Message.
GET /path/to/file/index.html HTTP/1.0
  • GET is the most common HTTP method. Other methods could be POST, HEAD etc.
  • The path is the part of the URL after the host name. This path is also called the request Uniform Resource Identifier (URI). A URI is like a URL, but more general.
  • The HTTP version always takes the form "HTTP/x.x", uppercase.

Initial Line : Response

The initial response line, called the status line, also has three parts separated by spaces:
  • The version of HTTP being used.
  • A response status code that gives the result of the request.
  • An English reason phrase describing the status code.
Here is an exampple of initial line for Response Message.
HTTP/1.0 200 OK

or

HTTP/1.0 404 Not Found

Header Lines

Header lines provide information about the request or response, or about the object sent in the message body.
The header lines are in the usual text header format, which is: one line per header, of the form "Header-Name: value", ending with CRLF. It's the same format used for email and news postings, defined in RFC 822.
  • A header line should end in CRLF, but you should handle LF correctly.
  • The header name is not case-sensitive.
  • Any number of spaces or tabs may be between the ":" and the value.
  • Header lines beginning with space or tab are actually part of the previous header line, folded into multiple lines for easy reading.
Here is an exampple of ione header line
User-agent: Mozilla/3.0Gold

or

Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT

The Message Body

An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server.
If an HTTP message includes a body, there are usually header lines in the message that describe the body. In particular:
  • The Content-Type: header gives the MIME-type of the data in the body, such as text/html or image/gif.
  • The Content-Length: header gives the number of bytes in the body.


The set of common methods for HTTP/1.0 is defined below. Although this set can be expanded.

The GET Method

The GET method means retrieve whatever information (in the form of an entity) is identified by the Request-URI. If the Request-URI refers to a data-producing process, it is the produced data which shall be returned as the entity in the response and not the source text of the process, unless that text happens to be the output of the process.
A conditional GET method requests that the identified resource be transferred only if it has been modified since the date given by the If-Modified-Since header. The conditional GET method is intended to reduce network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring unnecessary data.
The GET method can also be used to submit forms. The form data is URL-encoded and appended to the request URI

The HEAD Method

A HEAD request is just like a GET request, except it asks the server to return the response headers only, and not the actual resource (i.e. no message body). This is useful to check characteristics of a resource without actually downloading it, thus saving bandwidth. Use HEAD when you don't actually need a file's contents.
The response to a HEAD request must never contain a message body, just the status line and headers.

The POST Method

A POST request is used to send data to the server to be processed in some way, like by a CGI script. A POST request is different from a GET request in the following ways:
  • There's a block of data sent with the request, in the message body. There are usually extra headers to describe this message body, like Content-Type: and Content-Length:
  • The request URI is not a resource to retrieve; it's usually a program to handle the data you're sending.
  • The HTTP response is normally program output, not a static file.
The most common use of POST, by far, is to submit HTML form data to CGI scripts. In this case, the Content-Type: header is usually application/x-www-form-urlencoded, and the Content-Length: header gives the length of the URL-encoded form data. The CGI script receives the message body through STDIN, and decodes it. Here's a typical form submission, using POST:
POST /path/script.cgi HTTP/1.0
From: frog@jmarshall.com
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 32

home=Mosby&favorite+flavor=flies

GET vs POST Methods

If you were writing a CGI script directly i.e. not using PHP, but Perl, Shell, C, or antoher language you would have to pay attention to where you get the user's value/variable combinations. In the case of GET you would use the QUERY_STRING environment variable and in the case of POST you would use the CONTENT_LENGTH environment variable to control your iteration as you parsed for special characters to extract a variable and its value.

POST Method:

  • Query length can be unlimited (unlike in GET)
  • Is used to send a chunk of data to the server to be processed.
  • You can send entire files using post.
  • Your form data is attached to the end of the POST request (as opposed to the URL).
  • Not as quick and easy as using GET, but more versatile (provided that you are writing the CGI directly).

GET Method :

  • Your entire form submission can be encapsulated in one URL, like a hyperlink so can store a query by a just a URL
  • You can access the CGI program with a query without using a form.
  • Fully includes it in the URL: http://myhost.com/mypath/myscript.cgi?name1=value1&name2=value2.
  • Is how your browser downloads most files.
  • Don't use GET if you want to log each request.
  • Is used to get a file or other resource.

    Header lines provide information about the request or response, or about the object sent in the message body. This section will list out all the header fields available in HTTP Version 1.0

    Allow

    The Allow entity-header field lists the set of methods supported by the resource identified by the Request-URI. The purpose of this field is strictly to inform the recipient of valid methods associated with the resource.
    Example
    Allow: GET, HEAD

    Authorization

    The Authorization field value consists of credentials containing the authentication information of the user agent for the realm of the resource being requested.
    Example
    Authorization : credentials

    Content-Encoding

    The Content-Encoding entity-header field is used as a modifier to the media-type. When present, its value indicates what additional content coding has been applied to the resource, and thus what decoding mechanism must be applied in order to obtain the media-type referenced by the Content-Type header field. The Content-Encoding is primarily used to allow a document to be compressed without losing the identity of its underlying media type.
    Example
    Content-Encoding: x-gzip

    Content-Length

    The Content-Length entity-header field indicates the size of the Entity-Body, in decimal number of octets, sent to the recipient or, in the case of the HEAD method, the size of the Entity-Body that would have been sent had the request been a GET.
    Example
    Content-Length: 3495

    Content-Type

    The Content-Type entity-header field indicates the media type of the Entity-Body sent to the recipient or, in the case of the HEAD method, the media type that would have been sent had the request been a GET.
    Example
    Content-Type: text/html

    Date

    The Date general-header field represents the date and time at which the message was originated, having the same semantics as orig-date in RFC 822.
    Example
    Date: Tue, 15 Nov 1994 08:12:31 GMT

    Expires

    The Expires entity-header field gives the date/time after which the entity should be considered stale. This allows information providers to suggest the volatility of the resource, or a date after which the information may no longer be valid.
    Example
    Expires: Thu, 01 Dec 1994 16:00:00 GMT

    From

    The From request-header field, if given, should contain an Internet e-mail address for the human user who controls the requesting user agent. The address should be machine-usable, as defined by mailbox in RFC 822.
    Example
    From: webmaster@w3.org

    If-Modified-Since

    The If-Modified-Since request-header field is used with the GET method to make it conditional: if the requested resource has not been modified since the time specified in this field, a copy of the resource will not be returned from the server; instead, a 304 (not modified) response will be returned without any Entity-Body.
    Example
    If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT

    Last-Modified

    The Last-Modified entity-header field indicates the date and time at which the sender believes the resource was last modified.
    Example
    Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT

    Location

    The Location response-header field defines the exact location of the resource that was identified by the Request-URI. For 3xx responses, the location must indicate the server's preferred URL for automatic redirection to the resource. Only one absolute URL is allowed.
    Example
    Location: http://www.w3.org/hypertext/WWW/NewLocation.html

    Pragma

    The Pragma general-header field is used to include implementation-specific directives that may apply to any recipient along the request/response chain. All pragma directives specify optional behavior from the viewpoint of the protocol; however, some systems may require that behavior be consistent with the directives.
    Example
    Pragma = "Pragma" ":" 1#pragma-directive
    pragma-directive = "no-cache" | extension-pragma
    extension-pragma = token [ "=" word ]

    Referer

    The Referer request-header field allows the client to specify, for the server's benefit, the address (URI) of the resource from which the Request-URI was obtained.
    Example
    Referer: http://www.w3.org/hypertext/DataSources/Overview.html

    Server

    The Server response-header field contains information about the software used by the origin server to handle the request. The field can contain multiple product tokens and comments identifying the server and any significant subproducts.
    Example
    Server: CERN/3.0 libwww/2.17

    User-Agent

    The User-Agent request-header field contains information about the user agent originating the request. This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations.
    Example
    User-Agent: CERN-LineMode/2.15 libwww/2.17b3

    WWW-Authenticate

    The WWW-Authenticate response-header field must be included in 401 (unauthorized) response messages. The field value consists of at least one challenge that indicates the authentication scheme(s) and parameters applicable to the Request-URI.
    Example
    WWW-Authenticate = "WWW-Authenticate" ":" 1#challenge

    HTTP STATUS CODES

    This is a list of HTTP status messages that might be returned:

    1xx: Information
    Message: Description:
    100 Continue Only a part of the request has been received by the server, but as long as it has not been rejected, the client should continue with the request
    101 Switching Protocols The server switches protocol
    2xx: Successful
    Message: Description:
    200 OK The request is OK
    201 Created The request is complete, and a new resource is created 
    202 Accepted The request is accepted for processing, but the processing is not complete
    203 Non-authoritative Information
    204 No Content  
    205 Reset Content  
    206 Partial Content  
    3xx: Redirection
    Message: Description:
    300 Multiple Choices A link list. The user can select a link and go to that location. Maximum five addresses  
    301 Moved Permanently The requested page has moved to a new url 
    302 Found The requested page has moved temporarily to a new url 
    303 See Other The requested page can be found under a different url 
    304 Not Modified  
    305 Use Proxy  
    306 Unused This code was used in a previous version. It is no longer used, but the code is reserved
    307 Temporary Redirect The requested page has moved temporarily to a new url
    4xx: Client Error
    Message: Description:
    400 Bad Request The server did not understand the request
    401 Unauthorized The requested page needs a username and a password
    402 Payment Required You can not use this code yet
    403 Forbidden Access is forbidden to the requested page
    404 Not Found The server can not find the requested page
    405 Method Not Allowed The method specified in the request is not allowed
    406 Not Acceptable The server can only generate a response that is not accepted by the client
    407 Proxy Authentication Required You must authenticate with a proxy server before this request can be served
    408 Request Timeout The request took longer than the server was prepared to wait
    409 Conflict The request could not be completed because of a conflict
    410 Gone The requested page is no longer available 
    411 Length Required The "Content-Length" is not defined. The server will not accept the request without it 
    412 Precondition Failed The precondition given in the request evaluated to false by the server
    413 Request Entity Too Large The server will not accept the request, because the request entity is too large
    414 Request-url Too Long The server will not accept the request, because the url is too long. Occurs when you convert a "post" request to a "get" request with a long query information 
    415 Unsupported Media Type The server will not accept the request, because the media type is not supported 
    416   
    417 Expectation Failed  
    5xx: Server Error
    Message: Description:
    500 Internal Server Error The request was not completed. The server met an unexpected condition
    501 Not Implemented The request was not completed. The server did not support the functionality required
    502 Bad Gateway The request was not completed. The server received an invalid response from the upstream server
    503 Service Unavailable The request was not completed. The server is temporarily overloading or down
    504 Gateway Timeout The gateway has timed out
    505 HTTP Version Not Supported The server does not support the "http protocol" version

 HTTP MESSAGE EXAMPLE

To retrieve the file at the URL
http://www.somehost.com/path/file.html
first open a socket to the host www.somehost.com, port 80 (use the default port of 80 because none is specified in the URL). Then, send something like the following through the socket:
GET /path/file.html HTTP/1.0
From: someuser@tutorialspoint.com
User-Agent: HTTPTool/1.0
[blank line here]
The server should respond with something like the following, sent back through the same socket:
HTTP/1.0 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354

<html>
<body>
<h1>Happy New Millennium!</h1>

(more file contents)
  .
  .
  .
</body>
</html>
After sending the response, the server closes the socket.
To familiarize yourself with requests and responses, do manually experiment with HTTP using telnet.

Manually Experimenting with HTTP

Using telnet, you can open an interactive socket to an HTTP server. This lets you manually enter a request, and see the response written to your screen. It's a great help when learning HTTP, to see exactly how a server responds to a particular request. It also helps when troubleshooting.
From a Unix prompt, open a connection to an HTTP server with something like
telnet www.somehost.com 80
Then enter your request line by line, like
GET /path/file.html HTTP/1.0
[headers here, if any]
[blank line here]
After you finish your request with the blank line, you'll see the raw response from the server, including the status line, headers, and message body.

Socializer Widget
SOCIALIZE IT →
FOLLOW US →
SHARE IT →

0 comments:

Post a Comment