
本文隶属于专栏《1000个问题搞定大数据技术体系》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!
正文 from_json(jsonStr, schema[, options]) 描述本专栏目录结构和参考文献请见1000个问题搞定大数据技术体系
返回基于给定 jsonStr 和 schema 的结构体类型。
实践SELECt from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');
+---------------------------+
|from_json({"a":1, "b":0.8})|
+---------------------------+
| {1, 0.8}|
+---------------------------+
SELECt from_json('{"time":"26/08/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
+--------------------------------+
|from_json({"time":"26/08/2015"})|
+--------------------------------+
| {"time":2015-08-26 00:00:00}|
+--------------------------------+
options 支持列表
从 json 文本 json_txt中基于 路径 path 抽取出一个 JSON 对象。
实践SELECT get_json_object('{"a":"b"}', '$.a');
+-------------------------------+
|get_json_object({"a":"b"}, $.a)|
+-------------------------------+
| b|
+-------------------------------+
path 语法
使用$表示json变量标识,用 . 或 [] 读取对象或数组
get 单层值spark-sql> SELECT get_json_object('{
"store":
{
"fruit":[{"weight":8,"type":"apple"}, {"weight":9,"type":"pear"}],
"bicycle":{"price":19.95,"color":"red"}
},
"email":"amy@only_for_json_udf_test.net",
"owner":"amy"
}', '$.owner');
amy
get 多层值
spark-sql> SELECT get_json_object('{
"store":
{
"fruit":[{"weight":8,"type":"apple"}, {"weight":9,"type":"pear"}],
"bicycle":{"price":19.95,"color":"red"}
},
"email":"amy@only_for_json_udf_test.net",
"owner":"amy"
}', '$.store.bicycle.price');
19.95
get 数组值
spark-sql> SELECT get_json_object('{
"store":
{
"fruit":[{"weight":8,"type":"apple"}, {"weight":9,"type":"pear"}],
"bicycle":{"price":19.95,"color":"red"}
},
"email":"amy@only_for_json_udf_test.net",
"owner":"amy"
}', '$.store.fruit[0]');
{"weight":8,"type":"apple"}
json_array_length(jsonArray)
描述
返回最外层的 JSON 数组jsonArray中元素的数量。
实践SELECT json_array_length('[1,2,3,4]');
+----------------------------+
|json_array_length([1,2,3,4])|
+----------------------------+
| 4|
+----------------------------+
SELECT json_array_length('[1,2,3,{"f1":1,"f2":[5,6]},4]');
+------------------------------------------------+
|json_array_length([1,2,3,{"f1":1,"f2":[5,6]},4])|
+------------------------------------------------+
| 5|
+------------------------------------------------+
SELECT json_array_length('[1,2');
+-----------------------+
|json_array_length([1,2)|
+-----------------------+
| null|
+-----------------------+
json_object_keys(json_object)
描述
返回最外层 JSON 对象 json_object 的所有 keys 组成的数组。
实践ELECT json_object_keys('{}');
+--------------------+
|json_object_keys({})|
+--------------------+
| []|
+--------------------+
SELECT json_object_keys('{"key": "value"}');
+----------------------------------+
|json_object_keys({"key": "value"})|
+----------------------------------+
| [key]|
+----------------------------------+
SELECT json_object_keys('{"f1":"abc","f2":{"f3":"a", "f4":"b"}}');
+--------------------------------------------------------+
|json_object_keys({"f1":"abc","f2":{"f3":"a", "f4":"b"}})|
+--------------------------------------------------------+
| [f1, f2]|
+--------------------------------------------------------+
json_tuple(jsonStr, p1, p2, …, pn)
描述
- 返回一个类似函数 get_json_object 的元组。
- 这个元组可以有多个名称。
- 所有的输入参数和输出字段类型都是 String 类型。
SELECT json_tuple('{"a":1, "b":2}', 'a', 'b');
+---+---+
| c0| c1|
+---+---+
| 1| 2|
+---+---+
schema_of_json(json[, options])
描述
以 DDL 的形式返回 JSON 字符串 json 的 schema 形式。
实践SELECT schema_of_json('[{"col":0}]');
+----------------------------+
|schema_of_json([{"col":0}])|
+----------------------------+
|ARRAY>|
+----------------------------+
SELECT schema_of_json('[{"col":01}]', map('allowNumericLeadingZeros', 'true'));
+----------------------------+
|schema_of_json([{"col":01}])|
+----------------------------+
|ARRAY>|
+----------------------------+
to_json(expr[, options])
描述
通过给定的结构体返回一个 JSON 字符串。
实践SELECT to_json(named_struct('a', 1, 'b', 2));
+---------------------------------+
|to_json(named_struct(a, 1, b, 2))|
+---------------------------------+
| {"a":1,"b":2}|
+---------------------------------+
SELECT to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
+-----------------------------------------------------------------+
|to_json(named_struct(time, to_timestamp(2015-08-26, yyyy-MM-dd)))|
+-----------------------------------------------------------------+
| {"time":"26/08/2015"}|
+-----------------------------------------------------------------+
SELECT to_json(array(named_struct('a', 1, 'b', 2)));
+----------------------------------------+
|to_json(array(named_struct(a, 1, b, 2)))|
+----------------------------------------+
| [{"a":1,"b":2}]|
+----------------------------------------+
SELECT to_json(map('a', named_struct('b', 1)));
+-----------------------------------+
|to_json(map(a, named_struct(b, 1)))|
+-----------------------------------+
| {"a":{"b":1}}|
+-----------------------------------+
SELECT to_json(map(named_struct('a', 1),named_struct('b', 2)));
+----------------------------------------------------+
|to_json(map(named_struct(a, 1), named_struct(b, 2)))|
+----------------------------------------------------+
| {"[1]":{"b":2}}|
+----------------------------------------------------+
SELECT to_json(map('a', 1));
+------------------+
|to_json(map(a, 1))|
+------------------+
| {"a":1}|
+------------------+
SELECT to_json(array((map('a', 1))));
+-------------------------+
|to_json(array(map(a, 1)))|
+-------------------------+
| [{"a":1}]|
+-------------------------+
欢迎分享,转载请注明来源:内存溢出
微信扫一扫
支付宝扫一扫
评论列表(0条)