本文實(shí)例講述了python數(shù)據(jù)預(yù)處理之?dāng)?shù)據(jù)規(guī)范化。分享給大家供大家參考,具體如下:
數(shù)據(jù)規(guī)范化
為了消除指標(biāo)之間的量綱和取值范圍差異的影響,需要進(jìn)行標(biāo)準(zhǔn)化(歸一化)處理,將數(shù)據(jù)按照比例進(jìn)行縮放,使之落入一個(gè)特定的區(qū)域,便于進(jìn)行綜合分析。
數(shù)據(jù)規(guī)范化方法主要有:
- 最小-最大規(guī)范化
- 零-均值規(guī)范化
數(shù)據(jù)示例
代碼實(shí)現(xiàn)
1
2
3
4
5
6
7
8
|
#-*- coding: utf-8 -*- #數(shù)據(jù)規(guī)范化 import pandas as pd import numpy as np datafile = 'normalization_data.xls' #參數(shù)初始化 data = pd.read_excel(datafile, header = none) #讀取數(shù)據(jù) (data - data. min ()) / (data. max () - data. min ()) #最小-最大規(guī)范化 (data - data.mean()) / data.std() #零-均值規(guī)范化 |
從命令行可以看到下面的輸出:
>>> (data-data.min())/(data.max()-data.min(
0 1 2 3
0 0.074380 0.937291 0.923520 1.000000
1 0.619835 0.000000 0.000000 0.850941
2 0.214876 0.119565 0.813322 0.000000
3 0.000000 1.000000 1.000000 0.563676
4 1.000000 0.942308 0.996711 0.804149
5 0.264463 0.838629 0.814967 0.909310
6 0.636364 0.846990 0.786184 0.929571>>> (data-data.mean())/data.std()
0 1 2 3
0 -0.905383 0.635863 0.464531 0.798149
1 0.604678 -1.587675 -2.193167 0.369390
2 -0.516428 -1.304030 0.147406 -2.078279
3 -1.111301 0.784628 0.684625 -0.456906
4 1.657146 0.647765 0.675159 0.234796
5 -0.379150 0.401807 0.152139 0.537286
6 0.650438 0.421642 0.069308 0.595564
上述代碼改為使用print
語(yǔ)句打印,如下:
1
2
3
4
5
6
7
8
|
#-*- coding: utf-8 -*- #數(shù)據(jù)規(guī)范化 import pandas as pd import numpy as np datafile = 'normalization_data.xls' #參數(shù)初始化 data = pd.read_excel(datafile, header = none) #讀取數(shù)據(jù) print ((data - data. min ()) / (data. max () - data. min ())) #最小-最大規(guī)范化 print ((data - data.mean()) / data.std()) #零-均值規(guī)范化 |
可輸出如下打印結(jié)果:
0 1 2 3
0 0.074380 0.937291 0.923520 1.000000
1 0.619835 0.000000 0.000000 0.850941
2 0.214876 0.119565 0.813322 0.000000
3 0.000000 1.000000 1.000000 0.563676
4 1.000000 0.942308 0.996711 0.804149
5 0.264463 0.838629 0.814967 0.909310
6 0.636364 0.846990 0.786184 0.929571
0 1 2 3
0 -0.905383 0.635863 0.464531 0.798149
1 0.604678 -1.587675 -2.193167 0.369390
2 -0.516428 -1.304030 0.147406 -2.078279
3 -1.111301 0.784628 0.684625 -0.456906
4 1.657146 0.647765 0.675159 0.234796
5 -0.379150 0.401807 0.152139 0.537286
6 0.650438 0.421642 0.069308 0.595564
附:代碼中使用到的normalization_data.xls點(diǎn)擊此處下載。
希望本文所述對(duì)大家python程序設(shè)計(jì)有所幫助。
原文鏈接:https://blog.csdn.net/sinat_25873421/article/details/80753121