Very basic primer for HTML language
Last revision July 20, 2004
HTML is a "mark-up language", similar in principle to the computer typesetting language TeX, but vastly simplified. You embed commands, called "tags", within the document text to define the logical parts of the document, such as paragraphs, lists, headings, links to other pages, etc.
The Web browser then interprets these tags to format the page for viewing. Browsers may differ in how they format, for example, by using different styles or sizes of fonts, or different amounts of white space or indentation around sections. Many browsers let the user select these formatting characteristics. The key thing to remember is that the browser will be formatting the document for the viewer. It will ignore any text formatting that you do directly, such as indenting lines or breaking text into lines of certain lengths, unless it is done with an HTML tag.
HTML tags all follow this same format:
<tagname attributes>
They start with the left-pointing arrow bracket and end with the
right-pointing one. The
tagname
defines the formatting or link.
Optional attributes exist for some tags that allow further control.
Because the left and right arrow bracket characters (< and >) are used to
delimit tags, they cannot be used in ordinary text. Instead, if you want to
show one of these bracket characters in your text, you must use the special
symbolic names:
< | for < |
> | for > |
There are many special characters, such as foreign language accented letters,
that may be used in your document with the "&" prefix. You will need to check
an HTML reference to see the list. The ampersand (&) character itself cannot
be used directly in your text because it is used for referencing special characters.
To use an actual ampersand in your text, use the symbolic name
&
HTML tags may be freely interspersed with the text. Either of these input documents would create the same output in the browser:
-
This is a sentence with <em>emphasized</em> text.
This is a sentence with
<em>emphasized</em>
text.
Tag names and attributes may be in either upper or lower case (or a mixture). For example, you can use all uppercase letters for tag names to make them stand out when editing the file. But proposed future extensions to HTML may require that tags be all lower case.
Most HTML tags are used in pairs, one to begin and another to end the text to
which they are meant to apply. Attributes may only go into the beginning tag.
The ending tag has the same tag name as the beginning, but prefixed with the
slash (/) character. For example, the tag to define an unordered list (items
shown with bullet characters) is
"<ul>".
At the beginning of the list, you would have the beginning tag
<ul>
At the end of the list, you would have the corresponding ending tag
</ul>
If you are missing an ending tag, your Web browser is not going to complain. It is just going to mis-format your document.
Other HTML tags are used singly, because they simply mark a location in the
text where some operation is to occur, rather than needing to span or delimit a
section of text. The two most common ones are:
<p> | indicates start of a new paragraph |
<li> | indicates start of a new list element |
Certain HTML tags are required in all documents. If you have only simple text,
and you want the browser to make all decisions about how to format, you only
need these:
<html> and </html> | at the very beginning and end of the entire document. |
<head> and </head> | to delimit the head section, where the title of the page is defined. |
<body> and </body> | to delimit the actual "text" or content of the page. |
In practice, that minimum set of tags doesn't get you very far. It doesn't
even allow you to specify paragraphs - all your text would appear as a single
long paragraph. So in practice, these additional formatting tags are sure to
be used within your document. Type-style tags (such as
<em>)
can be nested within each other, with cumulative effects.
<title> and </title> | Put a short title between these two delimiters which the browser will show at the top of the window. The title must go into the "HEAD" section of the document. |
<p> | Use this tag by itself to indicate the start of a new paragraph. The browser typically puts a blank line between paragraphs. |
<br> | Breaks the line without starting a new paragraph. |
| This symbol is used to create a "non-breakable space" (that is, white space in the line that is the width of one average character). Use it to force a space where the browser would otherwise remove it. For example, you can use a series of five of these in a row at the beginning of a line to force indentation of that line by five spaces. |
<h1>
and
</h1>
<h2> and </h2> ... up to ... <h6> and </h6> |
Use sets of <hn> tags to set off text within the BODY section as a heading line. <h1> uses the largest type; <h6> the smallest. In all cases, the heading line text goes on a separate line and is set off with white space before and after. |
<ul>
and
</ul>
<ol> and </ol> <li> |
Use these sets of tags to define a section of text that is to be organized in list format. <ul> tags indicate an unordered list, where each new list item is marked with a bullet or other symbol. <ol> tags indicate an ordered list, where each new list item is marked with a number. List sections are usually indented and set off with white space before and after. Within the list section, use the <ul> tag to indicate the start of the next list item. A list item can have multiple paragraphs, each set off by a <p> tag. A list with no <li> tags provides a way to just indent a section of text. |
<em> and </em> | Indicate text to be rendered in an emphasized type style, usually by using italics. |
<strong> and </strong> | Indicate text to be rendered in a strongly emphasized type style, usually by using boldface type. |
<samp> and </samp> | Indicates text that is "sample output", such as from a computer program. Rendered in a fixed space (typewriter like) font. |
<pre> and </pre> | Sets off a section of text that is "pre-formatted" and to be rendered with indentation, amount of text on each line, etc., exactly as found in the input document. A fixed space (typewriter like) font will be used and tabs are normally set at 8 character intervals, although some browsers (for example, Internet Explorer 5 on the Macintosh) do not properly format tabs. Replace tabs with repeated blanks to be certain of the spacing. Some additional HTML tags (such as <em> and </em>) can be used within the pre-formatted text section and will be correctly interpreted. |
The
"<img>"
tag is used to include a graphic image file. It is
used by itself, not in a pair. It can appear anywhere in the BODY section of a
document, interspersed with the text. Browsers will
automatically download and display such images wherever this tag is
encountered in the document.
The image inclusion tag has the syntax:
<img src="URL" alt="alternate-text">
The URL (Uniform Resource Locator syntax) refers to the location of the image file. The ALT attribute is optional. It can be used to specify some text that will be presented instead of the image, if the browser is incapable of handling graphics images (such as the lynx browser on pangea).
Graphical browsers normally can display graphics directly if they are stored in either the GIF or JPEG format. GIF works better for line art; JPEG is better for photographic images. Save GIF image files with the suffix .gif as part of the filename. Save JPEG image files with the suffix .jpeg or .jpg as part of the filename.
The URL can refer to an image file on this same server or another server
somewhere else on the network. An example of a reference to a graphic image on
another site is:
<img src="http://www-pcd.stanford.edu/gifs/line.red2.gif">
If the file is in the same directory as the text document which references it,
then the URL can consist of just the plain filename without the
"http://..."
part, for example:
<img src="dog.gif">
An example of the ALT attribute would be the tag:
<img src="dog.gif" alt="Look how cute
our new puppy is!">
Finally, you use the <a> tag, in two different forms, to either create a link to another document (or a link to another section of the current document), or to mark sections of your document with "anchors" so links can be made directly to those sections. This tag is used within the BODY section, interspersed with the text.
To create a link to another document
(or to link to another "anchor" within the current document),
use this pair of tags:
<a href="URL-or-anchor">
text </a>
These tags enclose a section of text or a graphic image, which then becomes the link. When the user selects (clicks on) any word in the associated text, or on the associated image, then the link is activated and the browser fetches and displays the new page indicated by the link. Browsers usually indicate that a section of text is a link by underlining it and possibly changing the color of the text. A graphic image that is itself a link to another document will be outlined with a colored or dark border.
URL-or-anchor refers to either a valid URL on this or any other server, or to an "anchor" mark within the current document, made by the other form of the <a> tag (below). Be sure to enclose it within quotes. The URL syntax is briefly described above in the description of image tags, or in detail on the Uniform Resource Locator web page. The syntax for referring to an anchor within the current document is simply "#anchor-name", that is, precede the anchor name with the hash mark (#).
The other form of the
<a>
tag can be used to create an "anchor"
mark within your document. Use this pair of tags to bracket the text or image
that is to be the anchor:
<a name="anchor"> text
</a>
The "anchor" can be any arbitrary name that you like, such as "section1" or "picture1".
The text or graphic image included between the <a> tags in this form becomes the anchor location in the document, and will be displayed at the top of the window when the browser jumps to that location in the document. It is also possible to have a "null" anchor, that is, no text between the tags, in which case the text immediately following appears at the top of the window.