在stackoverflow上回報後,很快地收到了Mike的回覆:https://stackoverflow.com/questions/23368191/compute-engine-use-gsutil-to-download-tgz-file-has-crcmod-error
The object you're trying to download is a composite object (https://developers.google.com/storage/docs/composite-objects), which basically means it was uploaded in parallel chunks. gsutil automatically does this when uploading objects larger than 150M (a configurable threshold), to provide better performance.
Composite objects only have a crc32c checksum (no MD5), so in order to validate data integrity when downloading composite objects, gsutil needs to perform a crc32c checksum. Unfortunately, the libraries distributed with Python don't include a compiled crc32c implementation, so unless you install a compiled crc32c, gsutil will use a non-compiled Python implementation of crc32c that's quite slow. That warning is printed to let you know there's a way to fix that performance problem: Please run:
gsutil help crcmod
and follow the instructions there for installing a compiled crc32c. It's pretty easy to do it, and worth the effort.
One other note: I strongly recommend against setting check_hashes = never in your boto config file. That will disable integrity checking, which means it's possible your download could get corrupted and you wouldn't know it. You want data integrity checking enabled to ensure you're working with correct data.
Mike
此部分的問題歸因於crc32c這個套件未安裝
透過"gsutil help crcmod"指令可以列表出各種系統的安裝方式
這點google還滿貼心的 :D
留言
張貼留言