Copy Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Map: 0%| | 0/10000 [00:00<?, ? examples/s]
2024-05-31 22:52:35 [INFO] Start auto tuning.
2024-05-31 22:52:35 [INFO] Quantize model without tuning!
2024-05-31 22:52:35 [INFO] Quantize the model with default configuration without evaluating the model. To perform the tuning process, please either provide an eval_func or provide an eval_dataloader an eval_metric.
2024-05-31 22:52:35 [INFO] Adaptor has 5 recipes.
2024-05-31 22:52:35 [INFO] 0 recipes specified by user.
2024-05-31 22:52:35 [INFO] 3 recipes require future tuning.
2024-05-31 22:52:36 [INFO] *** Initialize auto tuning
2024-05-31 22:52:36 [INFO] {
2024-05-31 22:52:36 [INFO] 'PostTrainingQuantConfig': {
2024-05-31 22:52:36 [INFO] 'AccuracyCriterion': {
2024-05-31 22:52:36 [INFO] 'criterion': 'relative',
2024-05-31 22:52:36 [INFO] 'higher_is_better': True,
2024-05-31 22:52:36 [INFO] 'tolerable_loss': 0.01,
2024-05-31 22:52:36 [INFO] 'absolute': None,
2024-05-31 22:52:36 [INFO] 'keys': <bound method AccuracyCriterion.keys of <neural_compressor.config.AccuracyCriterion object at 0x7f313bffde90>>,
2024-05-31 22:52:36 [INFO] 'relative': 0.01
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'approach': 'post_training_weight_only',
2024-05-31 22:52:36 [INFO] 'backend': 'default',
2024-05-31 22:52:36 [INFO] 'calibration_sampling_size': [
2024-05-31 22:52:36 [INFO] 100
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'device': 'cpu',
2024-05-31 22:52:36 [INFO] 'diagnosis': False,
2024-05-31 22:52:36 [INFO] 'domain': 'auto',
2024-05-31 22:52:36 [INFO] 'example_inputs': 'Not printed here due to large size tensors...',
2024-05-31 22:52:36 [INFO] 'excluded_precisions': [
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'framework': 'pytorch_fx',
2024-05-31 22:52:36 [INFO] 'inputs': [
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'model_name': '',
2024-05-31 22:52:36 [INFO] 'ni_workload_name': 'quantization',
2024-05-31 22:52:36 [INFO] 'op_name_dict': None,
2024-05-31 22:52:36 [INFO] 'op_type_dict': {
2024-05-31 22:52:36 [INFO] '.*': {
2024-05-31 22:52:36 [INFO] 'weight': {
2024-05-31 22:52:36 [INFO] 'dtype': [
2024-05-31 22:52:36 [INFO] 'int'
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'bits': [
2024-05-31 22:52:36 [INFO] 4
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'algorithm': [
2024-05-31 22:52:36 [INFO] 'AUTOROUND'
2024-05-31 22:52:36 [INFO] ]
2024-05-31 22:52:36 [INFO] }
2024-05-31 22:52:36 [INFO] }
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'outputs': [
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'quant_format': 'default',
2024-05-31 22:52:36 [INFO] 'quant_level': 'auto',
2024-05-31 22:52:36 [INFO] 'recipes': {
2024-05-31 22:52:36 [INFO] 'smooth_quant': False,
2024-05-31 22:52:36 [INFO] 'smooth_quant_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'layer_wise_quant': False,
2024-05-31 22:52:36 [INFO] 'layer_wise_quant_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'fast_bias_correction': False,
2024-05-31 22:52:36 [INFO] 'weight_correction': False,
2024-05-31 22:52:36 [INFO] 'gemm_to_matmul': True,
2024-05-31 22:52:36 [INFO] 'graph_optimization_level': None,
2024-05-31 22:52:36 [INFO] 'first_conv_or_matmul_quantization': True,
2024-05-31 22:52:36 [INFO] 'last_conv_or_matmul_quantization': True,
2024-05-31 22:52:36 [INFO] 'pre_post_process_quantization': True,
2024-05-31 22:52:36 [INFO] 'add_qdq_pair_to_weight': False,
2024-05-31 22:52:36 [INFO] 'optypes_to_exclude_output_quant': [
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'dedicated_qdq_pair': False,
2024-05-31 22:52:36 [INFO] 'rtn_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'awq_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'gptq_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'teq_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'autoround_args': {
2024-05-31 22:52:36 [INFO] }
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'reduce_range': None,
2024-05-31 22:52:36 [INFO] 'TuningCriterion': {
2024-05-31 22:52:36 [INFO] 'max_trials': 100,
2024-05-31 22:52:36 [INFO] 'objective': [
2024-05-31 22:52:36 [INFO] 'performance'
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'strategy': 'basic',
2024-05-31 22:52:36 [INFO] 'strategy_kwargs': None,
2024-05-31 22:52:36 [INFO] 'timeout': 0
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'use_bf16': True
2024-05-31 22:52:36 [INFO] }
2024-05-31 22:52:36 [INFO] }
2024-05-31 22:52:36 [WARNING] [Strategy] Please install `mpi4py` correctly if using distributed tuning; otherwise, ignore this warning.
2024-05-31 22:52:36 [INFO] Pass query framework capability elapsed time: 6.51 ms
2024-05-31 22:52:36 [INFO] Do not evaluate the baseline and quantize the model with default configuration.
2024-05-31 22:52:36 [INFO] Quantize the model with default config.
2024-05-31 22:52:36 [INFO] All algorithms to do: {'AUTOROUND'}
2024-05-31 22:52:36 [INFO] quantizing with the AutoRound algorithm
2024-05-31 22:52:36 INFO utils.py L570: Using GPU device
2024-05-31 22:52:38 INFO autoround.py L465: using torch.float16 for quantization tuning
2024-05-31 22:54:14 INFO autoround.py L981: quantizing 1/18, layers.0
2024-05-31 22:55:27 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.042549 -> iter 195: 0.008586
2024-05-31 22:55:36 INFO autoround.py L981: quantizing 2/18, layers.1
2024-05-31 22:56:48 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.012353 -> iter 188: 0.003782
2024-05-31 22:56:58 INFO autoround.py L981: quantizing 3/18, layers.2
2024-05-31 22:58:08 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.003564 -> iter 183: 0.001639
2024-05-31 22:58:18 INFO autoround.py L981: quantizing 4/18, layers.3
2024-05-31 22:59:29 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.001812 -> iter 168: 0.000810
2024-05-31 22:59:38 INFO autoround.py L981: quantizing 5/18, layers.4
2024-05-31 23:00:49 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.001076 -> iter 168: 0.000581
2024-05-31 23:00:59 INFO autoround.py L981: quantizing 6/18, layers.5
2024-05-31 23:02:10 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.000886 -> iter 186: 0.000537
2024-05-31 23:02:20 INFO autoround.py L981: quantizing 7/18, layers.6
2024-05-31 23:03:30 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.000899 -> iter 191: 0.000483
2024-05-31 23:03:40 INFO autoround.py L981: quantizing 8/18, layers.7
2024-05-31 23:04:51 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.000975 -> iter 139: 0.000531
2024-05-31 23:05:01 INFO autoround.py L981: quantizing 9/18, layers.8
2024-05-31 23:06:12 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.001110 -> iter 177: 0.000699
2024-05-31 23:06:22 INFO autoround.py L981: quantizing 10/18, layers.9
2024-05-31 23:07:32 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.001358 -> iter 197: 0.000854
2024-05-31 23:07:42 INFO autoround.py L981: quantizing 11/18, layers.10
2024-05-31 23:08:52 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.001754 -> iter 178: 0.001185
2024-05-31 23:09:03 INFO autoround.py L981: quantizing 12/18, layers.11
2024-05-31 23:10:13 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.002099 -> iter 168: 0.001497
2024-05-31 23:10:23 INFO autoround.py L981: quantizing 13/18, layers.12
2024-05-31 23:11:35 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.002648 -> iter 194: 0.001862
2024-05-31 23:11:45 INFO autoround.py L981: quantizing 14/18, layers.13
2024-05-31 23:12:55 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.003757 -> iter 106: 0.002598
2024-05-31 23:13:06 INFO autoround.py L981: quantizing 15/18, layers.14
2024-05-31 23:14:17 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.007845 -> iter 105: 0.004465
2024-05-31 23:14:27 INFO autoround.py L981: quantizing 16/18, layers.15
2024-05-31 23:15:37 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.010025 -> iter 126: 0.006707
2024-05-31 23:15:47 INFO autoround.py L981: quantizing 17/18, layers.16
2024-05-31 23:16:58 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.015236 -> iter 160: 0.010168
2024-05-31 23:17:08 INFO autoround.py L981: quantizing 18/18, layers.17
2024-05-31 23:18:18 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.018019 -> iter 193: 0.012850
2024-05-31 23:18:29 INFO autoround.py L1096: quantization tuning time 1551.1706745624542
2024-05-31 23:18:29 INFO autoround.py L1112: Summary: quantized 126/126 in the model
2024-05-31 23:18:30 [INFO] |******Mixed Precision Statistics******|
2024-05-31 23:18:30 [INFO] +------------+---------+---------------+
2024-05-31 23:18:30 [INFO] | Op Type | Total | A32W4G32 |
2024-05-31 23:18:30 [INFO] +------------+---------+---------------+
2024-05-31 23:18:30 [INFO] | Linear | 126 | 126 |
2024-05-31 23:18:30 [INFO] +------------+---------+---------------+
2024-05-31 23:18:30 [INFO] Pass quantize model elapsed time: 1554793.01 ms
2024-05-31 23:18:30 [INFO] Save tuning history to /home/kubwa/kubwai/15-Huggingface/06_Optimzation/nc_workspace/2024-05-31_22-51-42/./history.snapshot.
2024-05-31 23:18:30 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process.
2024-05-31 23:18:30 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2024-05-31 23:18:30 [INFO] Save deploy yaml to /home/kubwa/kubwai/15-Huggingface/06_Optimzation/nc_workspace/2024-05-31_22-51-42/deploy.yaml