Performance
The Count distinct operation is computationally expensive and decreases database performance. BigQuery defaults to approximating the distinct count, using a HyperLogLog algorithm. But BigQuery also supports exact distinct count in a separate function.
One of two algorithms is typically used for distinct counts.
There are two possible solutions to the problem.
If you don’t require 100% accuracy you can use approximate count distinct aka HyperLogLog. Most modern databases support this functionality.
Here is an example of approximate count distinct using SQL Server
SELECT approx_count_distinct(O_OrderKey) AS approx_distinct_order_key
FROM Orders
Bitmap-based distinct counts can be performed incrementally, combining them with Materialized Views for query rewrite in databases that support such functionality. Snowflake and Oracle are two examples of databases that support bitmap-based distinct counts.
If you do not have an approximate count distinct (HyperLogLog) or bitmap based count distinct as part of your database, you could precompute the distinct counts as part of ETL.
For small data volumes you will not see a performance problem when using count distinct.
As the number of distinct values increases, the elapsed time and memory usage of standard count distinct increases significantly (non-linear growth).
The elapsed time and memory usage of approximate count distinct on the other hand remains consistent regardless of the number of distinct values in the data set (linear growth).