Python API¶

load_lib¶

load_lib(path):	Parameters path (string): path of gbdtmo.so Return Python warper of gbdtmo.so

create_graph¶

create_graph(file_name, tree_index=0, value_list=[]):

This function generate a Digraph instance of graphviz. You can render it by yourself.

Parameters

file_name (string): path of the dumped tree.
tree_index (int): the index (start from 0) of tree to be plotted.
value_list (list): list of index of output variables to be plotted. Only for GBDTMO. When set to [], all outputs variables will be considered.

Return

a Digraph instance of a learned tree.

GBDTMulti¶

GBDTMulti(lib, out_dim=1, params={}):

Create an instance of GBDTMO model.

__init__(lib, out_dim, params={}):
	Parameters lib: a Python warper of library by `load_lib`. out_dim(int): dimension of output. params(dict): a set of parameters. If a parameter is not contained here, it is set to its default value.
set_data(train_set=(), eval_set=()):
	Set training and eval datasets. eval_set can be missing. Histograms will be constructed and predictions will be initialized. Parameters train_set(tuple): a tuple of numpy array (x_data, x_label). x_data must be double and 2D array. If you don’t set label, x_label should be None. Otherwise, x_label must be double or int32. eval_set(tuple, default=None): the same as train_set.
_set_gh(self, g, h):
	Set gradient and hessian for growth next tree. Only used for user-defined loss. Parameters g(numpy.array): gradient h(numpy.array): hessian
_set_label(x, is_train):
	Reset label. Sometimes it avoids the re-construction of histogram. Parameters x(numpy.array): labels. is_train(bool): if true, set labels for train_set else for eval_set.
boost():	Growth a new tree after running `_set_gh`.
train(num):	training the model from scratch. Parameters num(int): number of boost round.
dump(path):	dump the model into a text file which has the following structure: Booster[i]: decision node M ... decision node 1 leaf node 1 ... leaf node N Booster[i+1]: ... For a decision node: node index, parent, left, right, split column, split value For a leaf node: leaf index, w_0, w_1, ..., w_n Parameters path(string): must be binary coding. For example, b”tree.txt”.
load(path):	load the model from a text file. Parameters path(string): must be binary coding. For example, b”tree.txt”.
predict(x, num_trees=0):
	Parameters x(numpy.array): input features num_trees(int): number of trees used to compute the prediction. If 0, all trees will be used. Return prediction of x.

GBDTSingle¶

GBDTSO is our own implementation of GBDT for single output. It is used to compare the training speed and accuracy with GBDTMO.

GBDTSingle(lib, out_dim, params={}):

Create an instance of GBDTSO model. Most of method is shared with GBDTSO. Here we only list the specific methods of GBDTSO.

train_multi(num):

training the model from scratch.

Parameters

num(int): number of boost round. In each round, out_dim of trees will be constructed. They correspond to output variables in order.

reset():

clear the learned trees and re-initialize the predictions to base_score.