Stream operation is one of the main feature of Java 8. I have met these code a lot recently in the work. So I thought it would be good to write it up and summarize the key point of the knowledge.
Creation of stream
There are multiple different ways to create steams:
Stream.of()
1 | static <T> Stream<T> of(T... values) |
Example:
1 | Stream<String> stringStream = Stream.of("a", "b", "c", "d"); |
Stream.iterate()
1 | static <T> Stream<T> iterate(T seed, UnaryOperator<T> f) |
Create a stream start with a value, and iterate through by a certain operator.
Example:
1 | Stream.iterate(10, n -> n + 1) |
Stream.generate()
1 | static <T> Stream<T> generate(Supplier<T> s) |
Create a stream based on the value generate function. The generate function returns 1 value each time
Example:
1 | Stream.generate(Math::random) |
Create a stream from an existing collection
1 | List<String> strings = Arrays.asList("hello", "world", "Java8"); |
Stream operation
Now we know how to create stream, let’s look at what operations can be applied to stream
Filter and slice
filter()
The filter method accepts a function that returns a boolean as a parameter, and returns a stream containing all the elements that match the condition.
For example, you can select all words starting with the letter w and print them like this:
1 | List<String> words = Arrays.asList("random", "hello", "wow", "world", "java"); |
Or you can get the list that match those condition like this:
1 | List<String> words = Arrays.asList("random", "hello", "wow", "world", "java"); |
distince()
The distinct method will return a stream of elements that are unique.
For example, the following code returns the list that has no duplicates:
1 | List<Integer> numbers = Arrays.asList(1, 2, 1, 3, 2, 1, 3, 4); |
skip()
The skip(n) method returns a stream that throws away the first n elements. If there are fewer than n elements in the stream, an empty stream is returned.
Mapping
A very common data processing method is to select information from certain objects. For example, in SQL, you can select a column from the table, and the Stream API also provides similar tools through the map
and flatMap
methods.
map()
Before Java8, we need to take out one field of a collection object and then save it to another collection. With stream map, we can simply use map function to map out the field we want, then use collect them to new collection
1 | public static void main(String[] args) { |
flatMap()
If we want to know how many unique character in a list of words, what should we do?
1 | List<String> words = Arrays.asList("Hello", "World"); |
This is wrong because the map function returns a String[]
. So we get a Stream<String[]>
instead of Stream<String>
. To solve this problem, we can use flatMap:
1 | List<String> words = Arrays.asList("Hello", "World"); |
The effect of using flatMap method is that each array is not mapped to a stream, but to the contents of the stream. In a nutshell, flatMap lets you convert each value in a stream to another stream, then join all the streams into one stream.
Find and match
Another common data processing method is to see if certain elements in the dataset match a given condition. The Stream API provides such tools through allMatch
, anyMatch
, noneMatch
, findFirst
, and findAny
methods.
The function name is very self-explanatory. Let see some examples
To check if a collection contains even numbers:
1 | List<Integer> numbers = Arrays.asList(1, 2, 3); |
To find the first number in the list that it’s square can be divided by 3
1 | List<Integer> numbers2 = Arrays.asList(1, 2, 3, 4, 5, 6, 7); |
Reduce
We might need to complete more complext task, such as “Choose the longest word in a word” or “Calculate the total length of all words”. Such queries need to repeatedly combine the elements in the stream to get the final value. Such a query can be classified as a reduction operation (reducing the stream to a value).
To calculate the sum of an array:
1 | List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5); |
Reduce returns the result every time and operates the result with the next element, such as the first time when traversing to element 1, returning the initial value 0 + 1 = 1, then using the return value of 1 and the second element 2, and so on, until the summation of the list of numbers is completed.
Intermediate and terminal operation
All operations on the Stream API fall into two categories: intermediate operations and terminal operations. The intermediate operation is just a kind of markup, and only the terminal operation will trigger the actual calculation.
Intermediate operations can be divided into stateless and stateful. Stateless intermediate operations mean that the processing of elements is not affected by the previous elements, and stateful intermediate operations must wait until all elements are processed. For example, sorting is a stateful operation, and the sorting result cannot be determined until all elements are read.
The terminal operation can be divided into short-circuit operation and non-short-circuit operation. The short-circuit operation means that the result can be returned without processing all the elements, such as finding the first element that satisfies the condition.
A stream pipeline consists of a stream source, followed by zero or more intermediate operations, and a terminal operation.
To summarize the Stream API:
A stream pipeline consists of a stream source, followed by zero or more intermediate operations, and a terminal operation.
Stream API | |||
---|---|---|---|
Intermediate Operation |
Stateless | unordered() |
If order doesn't matter, then can use unordered() together with parallel() to speed up |
filter() | Filter out elements in the stream based on a filter function | ||
map() | Map stream to another stream based on a map function | ||
mapToInt() mapToLong() mapToDouble() |
Map to int, long or double | ||
flatMap() | Make [["ABC", "DEF"], ["FGH", "IJK"]] into ["ABC", "DEF", "FGH", "IJK"]: | ||
flatMapToInt() flatMapToLong() flatMapToDouble() |
Similar to mapToInt(), mapToLong(), mapToDouble() | ||
peek() | Performs specified operation on each element of the stream and returns a new stream which can be used further. | ||
Stateful | distinct() | Filter out duplicate element in the stream | |
sorted() | Sort the stream | ||
limit() | Limit the number of elements in the stream | ||
skip() | Skip certain number of elements in the stream | ||
Terminal Operation |
short-circuit |
anyMatch() | Return if any element in the stream that satisfies the condition |
allMatch() | Return if all element in the stream that satisfies the condition | ||
noneMatch() | Return if no element in the stream that satisfies the condition | ||
findFirst() | Find first element in the stream that satisfies the condition | ||
findAny() | Find any element in the stream that satisfies the condition | ||
non-short-circuit | forEach() | Iterate all elements in the stream | |
forEachOrdered() | Iterate all elements in the stream by order. Can't make use of parallel() | ||
toArray() | Return an array of all elements in the stream | ||
reduce() | Reduce all elements to one results based on given function | ||
collect() | Process the elements in the stream and get all values | ||
max() | Find max element in the stream | ||
min() | Find min element in the stream | ||
count() | Get number of element in the stream |