Heart containing Coding Chica Java 101

A Trickle or a Flood! Java Stream Basics

TIP: References Quick List

Table of Contents

  1. Table of Contents
  2. Introduction
  3. Creating A Basic Stream
  4. Lazy Intermediate Stream Operations
    1. Side Effects
    2. Filtering
    3. Mapping Items
      1. Map
      2. FlatMap
      3. Map to Primitives
    4. Sorting
    5. Removing Duplicates
  5. Terminal Operations
    1. Reducing a Stream to a Single Output
      1. First Value
      2. Last Value
      3. Max Value
      4. Min Value
      5. None / All / Any Match
    2. Outputting A Collection
      1. An Array
      2. To List
      3. Collect
  6. Summary

Introduction

Java streams, such as those in the prior post, provide a way to perform processing on an array, collection, range of objects, file contents, etc. Java streams do not provide any storage. Once the stream’s pipeline is terminated, the stream is generally consumed. A separate calculation or pipeline would require the creation of a new stream. Intermediate Java stream operations are lazy and not executed until a terminal operation is included in the pipeline.

Creating A Basic Stream

A stream can be created in a number of ways. A few examples are:

ScenarioExample Serial ApproachExample Parallel Approach
ArrayArrays.stream(Object[])Arrays.stream(Object[]).parallel()
CollectionmyCollection.stream()myCollection.parallelStream()
Range of int valuesIntStream.range(0, 10)IntStream.range(0, 10).parallel()
Build a stream from individual objectsStream.builder().add(myObject1).add(myObject2).build()…Stream.builder().add(myObject1).add(myObject2).build().parallel()

Lazy Intermediate Stream Operations

Once a stream or parallel stream has been created, zero or more intermediate stream operations can be added to that stream’s pipeline (chain of instructions). These intermediate operations are lazy – not evaluated until a terminal operation is executed on the pipeline – for performance reasons, to reduce the amount of intermediate state generated.

The intermediate operations come in stateless and stateful flavors. Stateless operations process each item in the stream independently. For example, filtering out null values doesn’t need to know anything about other items in the stream. However, stateful operations may need information about one or more other items in the stream in order to process the current item. And example of a stateful intermediate operation is distinct().

Side Effects

When possible, avoid statements that cause side effects outside of the stream, as the underlying Java implementation may not execute those statements, if it deems them unnecessary:

API Note:

An implementation may choose to not execute the stream pipeline (either sequentially or in parallel) if it is capable of computing the count directly from the stream source. In such cases no source elements will be traversed and no intermediate operations will be evaluated. Behavioral parameters with side-effects, which are strongly discouraged except for harmless cases such as debugging, may be affected. For example, consider the following stream:

     List<String> l = Arrays.asList("A", "B", "C", "D");
     long count = l.stream().peek(System.out::println).count();
 

The number of elements covered by the stream source, a List, is known and the intermediate operation, peek, does not inject into or remove elements from the stream (as may be the case for flatMap or filter operations). Thus the count is the size of the List and there is no need to execute the pipeline and, as a side-effect, print out the list elements.

https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/Stream.html#count()

Also, there may be concerns about the ordering of method calls for each item in the stream, for example:

The behavior of this operation is explicitly nondeterministic. For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism. For any given element, the action may be performed at whatever time and in whatever thread the library chooses. If the action accesses shared state, it is responsible for providing the required synchronization.

https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/Stream.html#forEach(java.util.function.Consumer)

Filtering

A stream’s contents can be filtered in multiple ways, such as applying the filter method or distinct method:

List<String> distinctNames =
    Arrays.stream(namesToGreet)
        .filter(item -> item != null) // Filter out null values
        .filter(item -> !item.isBlank()) // Filter out blank strings
        .distinct() // Filter out duplicate values
...

As seen above, we can do repeated filtering on a stream with different logic. These commands may not be applied until a terminal operation is executed and the underlying Java implementation may elect to do so in as little as one pass through the stream in order performance tune the runtime behaviors.

Mapping Items

A mapping operation is an intermediate operation that transforms the data in some way.

Map

The map operation allows us to provide a transformation where each element in the stream results in a single element during the transformation.

As an example, we can map a collection of Strings to a collection of Characters, representing the first character of each String:

List<Character> firstCharacter = distinctNames.stream() // Stream<String> 
        .map(item -> item.charAt(0)) // Stream<Character>
        .toList();

FlatMap

A flat map allows us to transform each element in a stream into possibly many (a stream of) transformed elements, which is then flattened back into a single, combined stream:

List<String> wordTokens = distinctNames.stream()
        .flatMap(item -> Arrays.stream(item.split("\\W+"))) // There could be multiple words in a name, such as 'Mary Jane', so a stream is the output of the transformation function
        .toList();

Map to Primitives

There are similar mapTo<Primitive> and flatMapTo<Primitive> methods, such as mapToInt that returns a specialized stream with some additional operations. For example, the mapToInt method returns an IntStream, which contains min and max methods that requires no inputs (needs no comparators):

int maxLength = distinctNames.stream() // Stream<String>
      .mapToInt(item -> item.length()) // IntStream
        .max() // OptionalInt
        .orElse(0); // int

Sorting

Sorting can be performed on the elements of a stream as an interim step, either by the natural ordering or with a provided comparator:

Removing Duplicates

Using the equals(Object) logic, duplicate elements can be removed from stream using the distinct() method as an interim operation.

Terminal Operations

A terminal operation consumes a stream (with only a couple of exceptions – iterator and spliterator) and returns the output of the stream’s pipeline of operations.

Reducing a Stream to a Single Output

In some cases, we may want to determine a single value when processing a stream.

First Value

There is a method provided to help us obtain the first value in the stream. In this case, we have already done some filtering to get distinctNames, but the filters could also be applied inline in this stream, if desired:

String firstValue =
    distinctNames.stream()
        .findFirst() // Obtain the first value encountered
        .orElse(null);  // Default to null

Last Value

Similarly, we can obtain the last value in a stream, although there is no pre-defined method for that behavior:

String lastValue =
    distinctNames.stream()
        .reduce((first, second) -> second) // Find last value
        .orElse( null); // Default to null

Max Value

As long as we define the way to compare values (or use a pre-defined comparator) we can obtain the maximum value:

String largestValue =
    distinctNames.stream()
        .max(Comparator.naturalOrder()) // Find largest value
        .orElse(null); // Default to null

Min Value

Similarly, we can obtain the minimum value using either our own or a pre-defined comparator:

String largestValue =
    distinctNames.stream()
        .min(Comparator.naturalOrder()) // Find smallest value
        .orElse(null); // Default to null

None / All / Any Match

The noneMatch, allMatch, and anyMatch methods allow us to specify a filter against which we can test the elements of the stream.

boolean anyNumbersFound = distinctNames.stream() // Stream<String>
        .anyMatch(item -> item.matches(".*\\d+.*")); // boolean

Outputting A Collection

An Array

The Stream object provides a method to allow the stream pipeline’s output to be returned as an array: toArray().

To List

Similarly the Stream object provides a toList() method to return the pipeline’s output as a list.

Collect

For more complicated collections, such as maps, we will need to supply input parameters to govern how the collection should be achieved.

Map<Character, List<String>> namesByFirstCharacter = distinctNames.stream() // Stream<String>
              .collect(Collectors.groupingBy(name -> name.charAt(0)));  // Specify what the map key should be for each element in the stream

Summary

Java streams can help with optimization and parallelization of array, collection or similar data processing. Interim operations can perform filtering, mapping transformations, sorting and other such modifications, but should not include side effects (state changes outside of the stream) whenever possible. Terminal operations can reduce a stream to a single (often optional wrapped) value or can return an array or a collection of elements as a result of the stream processing. Resulting element ordering may not be maintained, especially if parallel stream processing is enabled or sorting is performed as an interim operation.

A Trickle or a Flood! Java Stream Basics

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.