Java Fundamentals  «Prev  Next»
Lesson 2It all starts with tokens
ObjectiveExplain the fundamental coding elements in Java programs.

Fundamental Coding Elements

When a program is processed by the Java compiler, it is first broken down into tokens. A token is the smallest code element in a program that is meaningful to the compiler. The following line of Java code contains five tokens:
  
boolean busy = true;

The tokens in this example are boolean, busy, =, true, and ;. Understanding tokens is critical, because tokens describe the fundamental structure of the Java programming language. Java tokens can be divided into five categories:

Java Identifiers

Identifiers are tokens that are used to represent names and are used a great deal in Java programming since many parts of a program require names. Along with making Java programs easier to understand, identifiers are also important because they uniquely identify parts of a program.
Java identifiers must begin with
  1. a letter,
  2. an underscore ( _ ), or
  3. a dollar sign ($), and
  4. can include both uppercase and lowercase letters.

Java is case sensitive, which means that the identifiers Ernie, ernie, and ERNIE are all differentiated from each other. Identifier characters after the first character can include the numbers 0 to 9. The only other catch to naming identifiers is that an identifier cannot share a name with a Java keyword such as class, if, or return.


Java Identifiers and Keywords

Classes, variables, and methods require names. In Java, these names are called identifiers, and, as you might expect, there are rules for what constitutes a legal Java identifier. Beyond what is legal, Java and Oracle programmers have created conventions for naming methods, variables, and classes. Like all programming languages, Java has a set of built-in keywords. These keywords must not be used as identifiers. Later in this chapter we will review the details of these naming rules, conventions, and the Java keywords.
Technically, legal identifiers must be composed of only Unicode characters, numbers, currency symbols, and connecting characters (such as underscores). The exam does not dive into the details of which ranges of the Unicode character set are considered to qualify as letters and digits. So, for example, you will not need to know that Tibetan digits range from \u0420 to \u0f29. Here are the rules you do need to know:
  1. Identifiers must start with a letter, a currency character ($), or a connecting character such as the underscore (_). Identifiers cannot start with a digit. After the first character, identifiers can contain any combination of letters, currency characters, connecting characters, or numbers.
  2. In practice, there is no limit to the number of characters an identifier can contain.
  3. You cannot use a Java keyword as an identifier. Table 1-1 lists all of the Java keywords.
  4. Identifiers in Java are case-sensitive; foo and FOO are two different identifiers.
Examples of legal and illegal identifiers follow. First some legal identifiers:
int _a;
int $c;
int ______2_w;
int _$;
int this_is_a_very_detailed_name_for_an_identifier;

The following are illegal (it's your job to recognize why):
int :b;
int -d;
int e#;
int .f;
int 7g;

  1. keywords,
  2. Literals,
  3. Operators, and

Java Separators group Coding Elements

Separators are tokens used by the Java compiler to group other coding elements. For example, commas are separators used to separate a list of items, much like a list of words in a sentence. Following are the separators used in Java:
{ } ; , :

Purpose of Java Separators
  1. ( ) Encloses arguments in method definitions and calling; adjusts precedence in arithmetic expressions; surrounds cast types and delimits test expressions in flow control statements
  2. { } defines blocks of code and automatically initializes arrays
  3. [ ] declares array types and dereferences array values
  4. ; terminates statements
  5. , separates successive identifiers in variable declarations; chains statements in the test, expression of a for loop
  6. . Selects a field or method from an object; separates package names from sub-package and class names
  7. : Used after loop labels

Terminator versus a Separator in Java

There is a distinction between terminator and separator.
  1. The comma between identifiers in declarations is a separator because it comes between elements in the list.
  2. The semicolon is a terminator because it ends each statement.
If the semicolon were a statement separator, the last semicolon in a code block would be unnecessary and (depending on the choice of the language designer) possibly invalid.


Examples of Java Tokens

1) Identifiers: Tokens that represent names
1) Identifiers: Tokens that represent names

2) Keywords: Special identifiers set aside as programming constructs
2) Keywords: Special identifiers set aside as programming constructs

3) Literals: Program data elements that are constant
3) Literals: Program data elements that are constant

4) Operators: Programming constructs used to specify an evaluation or computation
4) Operators: Programming constructs used to specify an evaluation or computation

5) Separators: Symbols to inform the Java compiler of how code elements are grouped
5) Separators: Symbols to inform the Java compiler of how code elements are grouped


Additional information with respect to Identifiers and Keywords can be found at the following link. Identifiers Keywords
  1. Identifiers: Tokens that represent names
  2. Keywords: Special identifiers set aside as programming constructs
  3. Literals: Program data elements that are constant
  4. Operators: Programming constructs used to specify an evaluation or computation
  5. Separators: Symbols to inform the Java compiler of how code elements are grouped

Coding elements that are not considered tokens include comments and whitespace (spaces, tabs, and end-of-lines), which are ignored by the Java compiler.

Unicode Character Set

Java programs are written using Unicode. You can use Unicode characters anywhere in a Java program, including comments and identifiers such as variable names. Unlike the 7-bit ASCII character set, which is useful only for English, and the 8-bit ISO Latin-1 character set, which is useful only for major Western European languages, the Unicode character set can represent virtually every written language in common use on the planet. If you do not use a Unicode-enabled text editor, or if you do not want to force other programmers who view or edit your code to use a Unicode-enabled editor, you can embed Unicode characters into your Java programs using the special Unicode escape sequence \uxxxx, in other words, a backslash and a lowercase u, followed by four hexadecimal characters. For example, \u0020 is the space character, and \u03c0 is the character Π

SEMrush Software