Snowflake Has Two Nulls

A Dall-E generated cartoon of a man wearing a shirt labeled Null, looking at his own mismatched reflection.
A Dall-E generated New Yorker cartoon. The dissolving sink tap reflects the Protean nature of Nullness itself, or something.

SQL

SQL's Null handling is confusing. Any value of any type can be Null, unless you impose a not null constraint. Aggregations like count() might include Nulls, or might not — it depends how you ask:

create or replace table null_demo (
    val integer
);

insert into null_demo values 
(123), 
(null);

select count(*) as c1, count(val) as c2
from null_demo

Results

C1 C2
2 1

Null is not equal to anything, even itself, but it's not not equal. All equality checks return Null:

select 123   = null as c1, 
       123  != null as c2,
       null  = null as c3,
       null != null as c4,
       null is null as c5 

Results

C1 C2 C3 C4 C5
null null null null TRUE

SQL's "boolean" operations have three values, not two. Their truth tables can be surprising:

select (true  and true)  as tt,
       (true  and false) as tf,
       (false and false) as ff,
       (true  and null)  as tn,
       (false and null)  as fn,
       (null  and null)  as nn

select (true  or true)   as tt,
       (true  or false)  as tf,
       (false or false)  as ff,
       (true  or null)   as tn,
       (false or null)   as fn,
       (null  or null)   as nn

Truth Table - AND

TT TF FF TN FN NN
TRUE FALSE FALSE null FALSE null

Truth Table - OR

TT TF FF TN FN NN
TRUE TRUE FALSE TRUE null null

For a rough mental model, replace Null with "Unknown". true or x is true for any value of X. false and x is always false. But true and x could be either true or false, depending on whether X is true. If X is unknown, the result is also unknown, which means the truth value of true or null is Unknown, or Null. 123 = x could also be either true or false, depending on X, so 123 = null is also Null. null = null asks "Are these two unknown values equal?" We don't know, so the answer is Null!

This mental model isn't perfect. join does not match Nulls with each other, but group by does combine Nulls, as if Null were a discrete value instead of a placeholder for the unknown. Aggregates like max() ignore Nulls altogether. SQL's three-value logic is not always logical:

select val, 
       (val = 0 or val != 0) as tautology, 
       (val = val) as tautology_2
from null_demo

Results

VAL TAUTOLOGY TAUTOLOGY_2
123 TRUE TRUE
null null null

We might not know which integer this Null represents, but we know it must be either zero or nonzero, and we know it must equal itself. In SQL, tautologies aren't always true.

Weird!

Javascript

Javascript's Null handling is confusing. There are two ways to represent nonexistence: Null and Undefined. These are distinct values with distinct types, but the language sometimes pretends otherwise, often to stay compatible with decades-old bugs, oversights, and hacks. It's fun to read the history behind these behaviors. It's less fun to program around them. Open your browser's developer console and follow along!

typeof undefined;            // undefined
typeof null;                 // object ?

undefined === undefined;     // true
typeof undefined;            // undefined
typeof document.all;         // undefined
document.all === undefined;  // false ?

A nonexistent object property is Undefined, but a nonexistent HTML element is Null:

let data = {a: 1};
data.a;            // 1
data.b;            // undefined

document.getElementByID('DoesNotExist'); // null

Null is a valid JSON value, but Undefined is not. JSON.stringify sometimes throws Undefined values away:

let data = {'a': null, 'b': undefined};
JSON.stringify(data);                   // '{"a":null}'
JSON.parse(JSON.stringify(data));       // {"a": null}

Sometimes converts them to Null:

let data = [1, 2, undefined];
JSON.stringify(data);              // '[1,2,null]'
JSON.parse(JSON.stringify(data));  // [1, 2, null]

And sometimes returns invalid JSON:

let null_json = JSON.stringify(null);
JSON.parse(null_json);       //  null

let undefined_json = JSON.stringify(undefined)
JSON.parse(undefined_json);  // Uncaught SyntaxError: "undefined" is not valid JSON

NaN, everyone's favorite number that isn't a number, is also invalid in JSON, but instead of throwing an error it's converted to Null:

let a = NaN;
a == null;                            // false
JSON.stringify(a);                    // null
a == JSON.parse(JSON.stringify(a));   // false

Null can be coerced to 0. Undefined can be coerced to NaN, which is serialized as Null, which can be coerced to 0:

Number();                           // 0
Number(null);                       // 0
Number(undefined);                  // NaN
Number(JSON.stringify(undefined));  // NaN
Number(JSON.parse(JSON.stringify(Number(JSON.stringify(undefined)))));  // 0 ?

This means that depending on context, and which type coercions Javascript tries under the hood, Null and Undefined might be equal to each other, but also might not be.

null == undefined;  // true
null >= undefined;  // false

Weird!

Snowflake

Snowflake's Null handling is confusing, because Snowflake is a SQL database that integrates with Javascript.

Snowflake lets you define Javascript functions which can be called inside SQL queries. Snowflake also supports Array, Object, and Variant data types, which make it easy to work with JSON. These are great features, and they help make Snowflake the nicest database software I've ever used, by far. For the most part, these systems work well enough that you can't see the seams. But once you hit an edge case, the scales will fall from your eyes.

Snowflake has two null values: SQL Null and JSON Null. When interfacing with Javascript or JSON, the familiar SQL Null maps to Undefined, while the new JSON Null maps to Javascript's Null.

create or replace function get_javascript_null()
	returns VARIANT
	language JAVASCRIPT
	as 
    $$
        return null;
    $$;

create or replace function get_javascript_undefined()
	returns VARIANT
	language JAVASCRIPT
	as 
    $$
        return undefined;
    $$;

select get_javascript_null()            as javascript_null,
       parse_json('null')               as json_null,
       get_javascript_undefined()       as javascript_undefined,
       null                             as sql_null,
       javascript_null = json_null      as null_equality,
       (javascript_undefined is null 
        and sql_null is null)           as undefined_equality

Results

JAVASCRIPT_NULL JSON_NULL JAVASCRIPT_UNDEFINED SQL_NULL NULL_EQUALITY UNDEFINED_EQUALITY
NULLᴶ NULLᴶ null null TRUE TRUE

These two nulls look slightly different in the Snowflake UI. To make the distinction more obvious, I'll use lower case "null" to mean SQL Null, and upper-case "NULLᴶ" with a J superscript to represent JSON Null. This will cause zero confusion. Trust me.

Snowflake provides the functions to_json and parse_json, which are SQL versions of the Javascript functions JSON.parse and JSON.stringify. In Javascript, JSON.parse(undefined) returns Undefined, so Snowflake's corresponding SQL expression parse_json('undefined') returns SQL Null, which is Snowflake's equivalent of Undefined. What did I tell you? Zero confusion. None.

select null                    as sql_null, 
       parse_json('undefined') as also_sql_null, 
       parse_json('null')      as json_null
SQL_NULL ALSO_SQL_NULL JSON_NULL
null null NULLᴶ

When Snowflake creates JSON objects, it tries to match Javascript's serialization logic, kind of. This works for objects, which drop SQL Null just like Javascript drops Undefined:

select object_construct(
    'a', 'hello', 
    'b', null, 
    'c', parse_json('null'), 
    'd', 'goodbye'
) as c1

Results

C1
{ "a": "hello", "c": NULL, "d": "goodbye" }

However, Snowflake does not always match Javascript. Snowflake lets Undefined values slip through if they're in arrays, and it leaves NaN values alone:

select to_json(parse_json('[1, 2, undefined, null]')) as c1,
       to_json(parse_json('{"a": NaN}'))              as c2

Results

C1 C2
'[1, 2, undefined, null]' '{"a": NaN}'

This means Snowflake can write JSON that Javascript refuses to read:

create or replace function parse_json_with_javascript(INPUT_STR VARCHAR)
	returns VARIANT
	language JAVASCRIPT
	as 
    $$
        return JSON.parse(INPUT_STR);
    $$;

select parse_json_with_javascript(to_json(parse_json('[undefined]')));
-- Uncaught SyntaxError: Unexpected token 'u', "[undefined]" is not valid JSON

JSON Null can show up even if you're not touching Javascript / JSON. Snowflake has a phenomenally useful function called parse_url, which breaks a URL into its hostname, path, query parameters, etc. If a URL has no query parameters, it will express this with JSON Null. This represents the fact that some URLs do have parameters, but this specific URL does not. On the other hand, asking for the URL's "fakeField" returns SQL Null. This represents the fact that no URL will ever have values for fakeField, because fakeField does not exist!

select parse_url('https://www.example.com') as url,
       url:"parameters"                     as parameters,
       url:"fakeField"                      as fake_field;

Results

PARAMETERS FAKE_FIELD
NULLᴶ null

If you've ever used Snowflake, you may wonder why you haven't suffocated under the weight of two incompatible Nulls. It's surprisingly easy to use Snowflake for years and never notice JSON Null, because it's fragile. At the drop of a hat, or the slightest hint of type conversion, it will metamorphose into SQL Null.

select parse_json('null')          as json_null
       parse_json('null')::text    as sql_null,
       parse_json('null') and true as sql_null_2,
       parse_json('null') + 1      as sql_null_3

Results

JSON_NULL SQL_NULL SQL_NULL_2 SQL_NULL_3
NULLᴶ null null null

JSON Null is not SQL Null. It's distinct from SQL Null, and of unknown (null) equality with SQL Null. Unlike SQL Null, JSON Null does equal itself.

select parse_json('null')                       as json_null
       parse_json('null') is null               as c1,
       parse_json('null') is distinct from null as c2,
       parse_json('null') = null                as c3,
       parse_json('null') = parse_json('null')  as c4

Results

JSON_NULL C1 C2 C3 C4
NULLᴶ FALSE TRUE null TRUE

SQL Null's type is... null, but JSON Null's type is NULL_VALUE. Snowflake offers a SQL function called is_null_value to test whether something is JSON Null. This function returns null when applied to SQL Null, since, after all, SQL Null could be anything — it could even be JSON Null!

select parse_json('null')                as json_null,
       typeof(123)                       as c1,
       typeof(parse_json('null'))        as c2, 
       typeof(null)                      as c3,
       is_null_value(123)                as c4,
       is_null_value(parse_json('null')) as c5,
       is_null_value(null)               as c6

Results

JSON_NULL C1 C2 C3 C4 C5 C6
NULLᴶ INTEGER NULL_VALUE null FALSE TRUE null

Interestingly, while NULL_VALUE is its own (non-Null) type, you can't create a column of that type.

create or replace table null_demo (
    c1 NULL_VALUE
);  
-- ERROR: Unsupported data type 'NULL_VALUE'

JSON Nulls are sometimes ignored in aggregations, just like SQL Nulls:

create or replace table null_demo (
    val VARIANT
);

insert into null_demo (
    select 123
);

insert into null_demo (
    select null
);

insert into null_demo (
    select parse_json('null')
);

select count(*) as c1, count(val) as c2
from null_demo

Results

C1 C2
3 1

But this behavior interacts strangely with a not null constraint. That constraint does not prevent loading JSON Nulls into a table. Instead, it changes the logic of count() to always count them!

create or replace table null_demo (
    val variant not null
);

insert into null_demo (
    select 123
);

--Fails
insert into null_demo (
    select null
);

--Succeeds
insert into null_demo (
    select parse_json('null')
);

select count(*) as c1, count(val) as c2
from null_demo

Results

C1 C2
2 2

Surprisingly, JSON Null is implemented as a string under the hood, disguised with enough special case logic to give engineers night terrors. In most cases, this is invisible. But when you combine SQL's confusing null handling, two null values, a hacky implementation, and implicit type coercion, there are times when all intuition is meaningless. Only madness remains.

select zeroifnull(123)                     as a1,
       ifnull(123, 0)                      as b1,
       ifnull(123, 'hello')                as c1,

       zeroifnull(null)                    as a2,
       ifnull(null, 0)                     as b2,
       ifnull(null, 'hello')               as c2,

       zeroifnull(parse_json('null'))      as a3,
       ifnull(parse_json('null'), 0)       as b3,
       ifnull(parse_json('null'), 'hello') as c3

Results

A1 B1 C1 A2 B2 C2 A3 B3 C3
123 123 123.0 0 0 hello 0 NULLᴶ hello

Weird!