對 AI 有興趣的我, 其實比較喜歡研究 model 的原理, 對於資料的處理沒興趣. 不過偶爾要用到某個語法, 卻又記不起來的話, 還是會有點傷腦筋! 所以筆記一下加深印象. 以後也好查詢.
- dataframe 轉 dataset
def df_to_dataset(dataframe):
dataframe = dataframe.copy()
labels = dataframe.pop('your target')
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
return ds
2. Categorical 轉 Numeral (one-hot)
from tensorflow import feature_column as fc
# invest_df is predefined dataframe
A = ['stock','bond','ETF']
B = []
for C in A:
D = invest_df[C].unique()
E = fc.categorical_column_with_vocabulary_list(C, D)
F = fc.indicator_column(E)
B.append(F)
3. Bucketized 轉 numerical
G = fc.numeric_column("net_asset")
# Bucketized cols
H = fc.bucketized_column(G, boundaries=[10, 20, 30, 40, 50, 60, 80, 100]) # in million USD
B.append(H)
4. Feature Cross (Bucketized + Categorical)
I = invest_df['FATFIRE_proximity'].unique()
J = fc.categorical_column_with_vocabulary_list('FATFIRE_proximity',I)
crossed_feature = fc.crossed_column([H, I],hash_bucket_size=1000)
crossed_feature = fc.indicator_column(crossed_feature)
B.append(crossed_feature)
5. 實際運用
# input_dim = 上述的 feature 個數, 此時 = 8
# 假設下一層是 12 nodes.
# 8 x 12 是 fully connected.
feature_layer = tf.keras.layers.DenseFeatures(B, dtype='float64')
model = tf.keras.Sequential([
feature_layer,
layers.Dense(12, input_dim=8, activation='relu'),
layers.Dense(8, activation='relu'),
layers.Dense(1, activation='linear', name='your target')
])
6. 產生新的 Bucketetized feature 做成 Feature Cross
M = np.linspace(0, 1, nbuckets).tolist()
N = np.linspace(0, 1, nbuckets).tolist()
OP = fc.bucketized_column(B['OPEN_PRICE'], M)
CP = fc.bucketized_column(B['CLOSE_PRICE'], M)
OV = fc.bucketized_column(B['OPEN_VOL'], N)
CV = fc.bucketized_column(B['CLOSE_VOL'], N)
OO = fc.crossed_column([OP, OV], nbuckets * nbuckets)
CC = fc.crossed_column([CP, CV], nbuckets * nbuckets)
new_bucket_cross_feature = fc.crossed_column([OO, CC], nbuckets ** 4)
7. 畫出酷炫流程圖並存檔
tf.keras.utils.plot_model(model, 'model.png', show_shapes=False, rankdir='LR') # or 'TB'