1 changed file
pylib | ||
Add comment 232 - [Hadoop](http://hadoop.apache.org/) HDFS:
Add comment 233 - `hdfs_find_replication_factor_1.py` - finds HDFS files with replication factor 1, optionally resetting them to replication factor 3 to avoid missing block alerts during datanode maintenance windows
Add comment 234 - `hdfs_time_block_reads.jy` - HDFS per-block read timing debugger with datanode and rack locations for a given file or directory tree. Reports the slowest Hadoop datanodes in descending order at the end. Helps find cluster data layer bottlenecks such as slow datanodes, faulty hardware or misconfigured top-of-rack switch ports.
Add comment 235 Minus - `hdfs_files_native_checksums.jy` - fetches native HDFS checksums for quicker file comparisons (about 100x faster than doing hdfs dfs -cat | md5sum)
Add comment 235 Plus - `hdfs_files_native_checksums.jy` - fetches native HDFS checksums for quicker file comparisons (about 100x faster than doing `hdfs dfs -cat | md5sum`)
Add comment 236 - `hdfs_files_stats.jy` - fetches HDFS file stats. Useful to generate a list of all files in a directory tree showing block size, replication factor, underfilled blocks and small files
Add comment 237 - [Hive](https://hive.apache.org/) / [Impala](https://impala.apache.org/):
Add comment 238 - `hive_schemas_csv.py` / `impala_schemas_csv.py` - dumps all databases, tables, columns and types out in CSV format to standard output