Java Fundamentals  «Prev  Next»
Lesson 2It all starts with tokens
ObjectiveExplain the fundamental coding element in Java programs.

Fundamental Coding Elements

When a program is processed by the Java compiler, it is first broken down into tokens. A token is the smallest code element in a program that is meaningful to the compiler.
The following line of Java code contains five tokens:
boolean busy = true;

The tokens in this example are boolean, busy, =, true, and ;.
Understanding tokens is critical, because tokens describe the fundamental structure of the Java programming language. Java tokens can be divided into five categories:
  1. Identifiers,
  2. keywords,
  3. Literals,
  4. Operators, and
  5. Separators.

  1. Identifiers: Tokens that represent names
  2. Keywords: Special identifiers set aside as programming constructs
  3. Literals: Program data elements that are constant
  4. Operators: Programming constructs used to specify an evaluation or computation
  5. Separators: Symbols to inform the Java compiler of how code elements are grouped

Identifiers Keywords Literals Operators
Coding elements that are not considered tokens include comments and whitespace (spaces, tabs, and end-of-lines), which are ignored by the Java compiler.

Unicode Character Set

Java programs are written using Unicode. You can use Unicode characters anywhere in a Java program, including comments and identifiers such as variable names. Unlike the 7-bit ASCII character set, which is useful only for English, and the 8-bit ISO Latin-1 character set, which is useful only for major Western European languages, the Unicode character set can represent virtually every written language in common use on the planet. If you do not use a Unicode-enabled text editor, or if you do not want to force other programmers who view or edit your code to use a Unicode-enabled editor, you can embed Unicode characters into your Java programs using the special Unicode escape sequence \uxxxx, in other words, a backslash and a lowercase u, followed by four hexadecimal characters. For example, \u0020 is the space character, and \u03c0 is the character Π