Learning Resources
 

Scanning and Formatting

Scanning

Objects of type Scannerare useful for breaking down formatted input into tokens and translating individual tokens according to their data type.

Breaking Input into Tokens

By default, a scanner uses white space to separate tokens. (White space characters include blanks, tabs, and line terminators. For the full list, refer to the documentation for Character.isWhitespace.) To see how scanning works, let's look at ScanXan, a program that reads the individual words in xanadu.txtand prints them out, one per line.

import java.io.*;
import java.util.Scanner;

public class ScanXan {
    public static void main(String[] args) throws IOException {

        Scanner s = null;

        try {
            s = new Scanner(new BufferedReader(new FileReader("xanadu.txt")));

            while (s.hasNext()) {
                System.out.println(s.next());
            }
        } finally {
            if (s != null) {
                s.close();
            }
        }
    }
}

Notice that ScanXaninvokes Scanner's closemethod when it is done with the scanner object. Even though a scanner is not a stream, you need to close it to indicate that you're done with its underlying stream.

The output of ScanXanlooks like this:

In
Xanadu
did
Kubla
Khan
A
stately
pleasure-dome
...

To use a different token separator, invoke useDelimiter(), specifying a regular expression. For example, suppose you wanted the token separator to be a comma, optionally followed by white space. You would invoke,

s.useDelimiter(",\\s*");

Translating Individual Tokens

The ScanXanexample treats all input tokens as simple Stringvalues. Scanneralso supports tokens for all of the Java language's primitive types (except for char), as well as BigIntegerand BigDecimal. Also, numeric values can use thousands separators. Thus, in a USlocale, Scannercorrectly reads the string "32,767" as representing an integer value.

We have to mention the locale, because thousands separators and decimal symbols are locale specific. So, the following example would not work correctly in all locales if we didn't specify that the scanner should use the USlocale. That's not something you usually have to worry about, because your input data usually comes from sources that use the same locale as you do. But this example is part of the Java Tutorial and gets distributed all over the world.

The ScanSumexample reads a list of doublevalues and adds them up. Here's the source:

import java.io.FileReader;
import java.io.BufferedReader;
import java.io.IOException;
import java.util.Scanner;
import java.util.Locale;

public class ScanSum {
    public static void main(String[] args) throws IOException {

        Scanner s = null;
        double sum = 0;

        try {
            s = new Scanner(new BufferedReader(new FileReader("usnumbers.txt")));
            s.useLocale(Locale.US);

            while (s.hasNext()) {
                if (s.hasNextDouble()) {
                    sum += s.nextDouble();
                } else {
                    s.next();
                }   
            }
        } finally {
            s.close();
        }

        System.out.println(sum);
    }
}

And here's the sample input file, usnumbers.txt

8.5
32,767
3.14159
1,000,000.1

The output string is "1032778.74159". The period will be a different character in some locales, because System.outis a PrintStreamobject, and that class doesn't provide a way to override the default locale. We could override the locale for the whole program — or we could just use formatting, as described in the next topic, Formatting.

Formatting

Stream objects that implement formatting are instances of either PrintWriter, a character stream class, or PrintStream, a byte stream class.


Note: The only PrintStreamobjects you are likely to need are System.outand System.err. (See I/O from the Command Line for more on these objects.) When you need to create a formatted output stream, instantiate PrintWriter, not PrintStream.

Like all byte and character stream objects, instances of PrintStreamand PrintWriterimplement a standard set of writemethods for simple byte and character output. In addition, both PrintStreamand PrintWriterimplement the same set of methods for converting internal data into formatted output. Two levels of formatting are provided:

  • print and printlnformat individual values in a standard way.
  • format formats almost any number of values based on a format string, with many options for precise formatting.

The printand printlnMethods

Invoking printor printlnoutputs a single value after converting the value using the appropriate toStringmethod. We can see this in the Rootexample:

public class Root {
    public static void main(String[] args) {
        int i = 2;
        double r = Math.sqrt(i);
        
        System.out.print("The square root of ");
        System.out.print(i);
        System.out.print(" is ");
        System.out.print(r);
        System.out.println(".");

        i = 5;
        r = Math.sqrt(i);
        System.out.println("The square root of " + i + " is " + r + ".");
    }
}

Here is the output of Root:

The square root of 2 is 1.4142135623730951.
The square root of 5 is 2.23606797749979.

The iand rvariables are formatted twice: the first time using code in an overload of print, the second time by conversion code automatically generated by the Java compiler, which also utilizes toString. You can format any value this way, but you don't have much control over the results.

The formatMethod

The formatmethod formats multiple arguments based on a format string. The format string consists of static text embedded with format specifiers; except for the format specifiers, the format string is output unchanged.

Format strings support many features. In this tutorial, we'll just cover some basics. For a complete description, see format string syntaxin the API specification.

The Root2example formats two values with a single formatinvocation:

public class Root2 {
    public static void main(String[] args) {
        int i = 2;
        double r = Math.sqrt(i);
        
        System.out.format("The square root of %d is %f.%n", i, r);
    }
}

Here is the output:

The square root of 2 is 1.414214.

Like the three used in this example, all format specifiers begin with a %and end with a 1- or 2-character conversion that specifies the kind of formatted output being generated. The three conversions used here are:

  • d formats an integer value as a decimal value.
  • f formats a floating point value as a decimal value.
  • n outputs a platform-specific line terminator.

Here are some other conversions:

  • x formats an integer as a hexadecimal value.
  • s formats any value as a string.
  • tB formats an integer as a locale-specific month name.

There are many other conversions.


Note: 

Except for %%and %n, all format specifiers must match an argument. If they don't, an exception is thrown.

In the Java programming language, the \nescape always generates the linefeed character (\u000A). Don't use \nunless you specifically want a linefeed character. To get the correct line separator for the local platform, use %n.


In addition to the conversion, a format specifier can contain several additional elements that further customize the formatted output. Here's an example, Format, that uses every possible kind of element.

public class Format {
    public static void main(String[] args) {
        System.out.format("%f, %1$+020.10f %n", Math.PI);
    }
}

Here's the output:

3.141593, +00000003.1415926536

The additional elements are all optional. The following figure shows how the longer specifier breaks down into elements.

Elements of a format specifier

 

Elements of a Format Specifier.

The elements must appear in the order shown. Working from the right, the optional elements are:

  • Precision. For floating point values, this is the mathematical precision of the formatted value. For sand other general conversions, this is the maximum width of the formatted value; the value is right-truncated if necessary.
  • Width. The minimum width of the formatted value; the value is padded if necessary. By default the value is left-padded with blanks.
  • Flags specify additional formatting options. In the Formatexample, the +flag specifies that the number should always be formatted with a sign, and the 0flag specifies that 0is the padding character. Other flags include -(pad on the right) and ,(format number with locale-specific thousands separators). Note that some flags cannot be used with certain other flags or with certain conversions.
  • The Argument Index allows you to explicitly match a designated argument. You can also specify <to match the same argument as the previous specifier. Thus the example could have said: System.out.format("%f, %<+020.10f %n", Math.PI);
--Oracle