我已经创建了一个配置单元表,现在我想将snapy压缩数据加载到表中。因此,我做了以下工作:
SET mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET hive.exec.compress.output=true;
SET mapreduce.output.fileoutputformat.compress=true;
CREATE TABLE toydata_table (id STRING, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";'
然后我创建了一个名为toydata.csv的csv文件,该文件包含以下内容:
A,Value1
B,Value2
C,Value3
我用snzip(https://github.com/kubo/snzip)压缩了这个文件
/usr/local/bin/snzip -t snappy-java toydata.csv
会产生
toydata.csv.snappy
。完成此操作后,我返回到配置单元cli并通过LOAD DATA LOCAL INPATH "toydata.csv.snappy" INTO TABLE toydata_table;
加载数据。但现在我想尝试从该表查询并得到以下错误消息:hive> select * from toydata_table;
OK
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:189)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:175)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:433)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:515)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1471)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
我对gzip做了同样的事情,使用gzip也很好。那么,为什么这部分会失败呢?
最佳答案:
请在群集上安装Snappy压缩编解码器。如果要确认是否安装了Snappy,请在库中找到libsnappy.so文件。
另外,还需要用--auxpath参数启动配置单元shell,并提供snappy.jar。例如:hive--auxpath/home/user/snappy1.0.4.1.jar。