如何将算子添加到Relay-柠檬ai自媒体

如何将算子添加到继电器

本文将介绍在继电器中注册新电视媒体算子所需的步骤。将一对添加累积产品运算示例公关。本身建立在另一个一对的基础上，该一对添加了一个累积和运算。

注册新算子需要几个步骤：

添加一个属性节点，声明编译时已知的固定参数

为算子编写一个类型关系，集成到继电器的类型系统中。

使用C中的继电器寄存器操作宏，注册编译器的算子属性、类型和其它提示

如何计算算子

向继电器算子注册计算、调度

定义C函数，为算子生成调用节点并注册函数的Python API挂钩

将上述Python API挂钩包装在一个更完备的接口中

为新继电器算子编写测试

1.定义属性节点

属性是在编译时已知的固定参数。卷积算子的进展和伸缩，属于卷积算子的属性节点中的字段的适当示例。

属性应该在包括/tvm/继电器/attrs/文件夹的文件中定义。

最终，希望创建一个算子，在大蟒接口中可以清楚地看到该算子的接口：

def坎普尔（数据，轴=无，数据类型=无，排他=无):

' ' Numpy风格的孜然芹操作。返回元素的累积乘积

给定的轴。

因素

数据：继电器Expr .

操作员的输入数据。

轴： int，可选

计算累计乘积的轴。默认值（无)是计算

扁平阵列上的孜然芹。

dtype :字符串，可选

返回数组的类型和元素相乘的累加器的类型。

如果未指定数据类型，则默认为数据的数据类型。

独家： bool，可选

如果为没错，将返回第一个元素不在其中的独占乘积

包括在内。换句话说，如果为真，第j个输出元素将是

第一个(j-1)元素的乘积。否则，它将是

前j个元素。零元素的乘积将是1。

结果：继电器Expr .

如果轴不是"无",结果的大小与数据相同，形状与数据相同。

> If axis is None, the result is a 1-d array.

"""

cumsum（）存在类似的接口。

因此，在include/tvm/relay/attrs/transform.h中定义属性时，选择运算的轴、累积数据类型和独占性，作为struct结构体的适当字段。

/*! \brief Attributes used in cumsum and cumprod operator */

struct ScanopAttrs : public tvm::AttrsNodeScanopAttrs {

 Integer axis;

 DataType dtype;

 Bool exclusive = Bool(false);

 TVM_DECLARE_ATTRS(ScanopAttrs, "relay.attrs.ScanopAttrs") {

 TVM_ATTR_FIELD(axis).describe("The axis to operate over").set_default(NullValueInteger());

 TVM_ATTR_FIELD(dtype).describe("Output data type").set_default(NullValueDataType());

 TVM_ATTR_FIELD(exclusive)

 .describe("The first element is not included")

 .set_default(Bool(false));

};

2. 编写类型关系

为了允许在注册算子时具有灵活性，以及在Relay中表达类型时，具有更大的表达能力和粒度，使用输入和输出类型间的关系键入算子。这些关系表示为函数，这些函数接受输入类型和输出类型列表（这些类型中的任何一种都可能不完整），返回满足该关系的输入和输出类型列表。这包括可在编译时静态确定的shape信息。本质上，算子的关系除了计算输出类型，可以强制执行所有必要的类型规则（即通过检查输入类型）。

累积积与和算子的类型关系可在src/relay/op/tensor/transform.cc中找到：

TVM_REGISTER_NODE_TYPE(ScanopAttrs);

bool ScanopRel(const ArrayType types, int num_inputs, const Attrs attrs, const TypeReporter reporter) {

 // types: [data, output]

 ICHECK_EQ(types.size(), 2)  "Expects two types, one for the input and another for the output";

 const auto* data = types[0].asTensorTypeNode();

 if (data == nullptr) {

 ICHECK(types[0].asIncompleteTypeNode())

  "Scanop: expect input type to be TensorType but get "  types[0];

 return false;

 const auto* param = attrs.asScanopAttrs();

 auto dtype = param-dtype;

 if (dtype.is_void()) {

 dtype = data-dtype;

 if (param-axis.defined()) {

 reporter-Assign(types[1], TensorType(data-shape, dtype));

 } else {

 auto prod = data-shape[0];

 for (size_t i = 1; i  data-shape.size(); ++i) {

 prod = prod * data-shape[i];

 reporter-Assign(types[1], TensorType({prod}, dtype));

 return true;

3. 将Arity和属性与运算关联

然后，注册新运算的名称，用调用接口进行注释。C++中的RELAY_REGISTER_OP宏，允许开发人员指定Relay中的算子的以下信息：

Arity（参数数量）

位置参数的名称和说明

支持级别（1表示内部固有；数字越大表示积分或外部支持的算子越少）

算子的类型关系

优化运算时有用的注释。

添加到src/relay/op/tensor/transform.cc：

RELAY_REGISTER_OP("cumsum")

 .describe(

 R"doc(Return the cumulative sum of the elements along a given axis.)doc" TVM_ADD_FILELINE)

 .set_num_inputs(1)

 .add_argument("data", "Tensor", "The input tensor.")

 .set_support_level(3)

 .add_type_rel("Cumsum", ScanopRel)

 .set_attrTOpPattern("TOpPattern", kOpaque);

RELAY_REGISTER_OP("cumprod")

 .describe(

 R"doc(Return the cumulative product of the elements along a given axis.)doc" TVM_ADD_FILELINE)

 .set_num_inputs(1)

 .add_argument("data", "Tensor", "The input tensor.")

 .set_support_level(3)

 .add_type_rel("Cumprod", ScanopRel)

 .set_attrTOpPattern("TOpPattern", kOpaque);

在本例中，TOpPattern是对编译器的一个关于算子所执行的计算模式的提示，这对于融合算子可能很有用。kOpaque告诉TVM不要费心尝试融合这个算子。

4. 定义运算的计算

虽然现在已经为操作定义了接口，但仍然需要定义如何执行累计和与积的实际计算。

编写此代码超出了本问的范围。现在，假设有一个经过良好测试的操作计算实现。有关如何执行此操作的更多详细信息，建议查阅有关张量表达式、TVM算子清单（topi）的文件，查看python/TVM/topi/scan.py和python/TVM/topi/cuda/scan.py中的gpu版本中的示例累积和与产品实现。在累积和与积运算的情况下，直接在TIR中写入内容，这是张量表达式和topi将降低到的表示形式。

5. 将计算和策略与Relay连接起来

实现了计算功能后，现在需要粘到Relay操作上。在TVM中，不仅要定义计算，还要定义操作的调度。策略是一种选择要使用的计算和计划的方法。例如，对于二维卷积，可能认识到正在进行深度卷积，因此分派到更高效的计算和调度。然而，在例子中，除了CPU和GPU实现间的调度外，没有这样的需求。在python/tvm/relay/op/strategy/generic.py和python/tvm/relay/op/strategy/cuda.py中，添加了以下策略：

def wrap_compute_scanop(topi_compute):

 """Wrap scanop style topi compute"""

 def _compute_scanop(attrs, inputs, _):

 return [topi_compute(inputs[0], attrs.axis, attrs.dtype, attrs.exclusive)]

 return _compute_scanop

@override_native_generic_func("cumsum_strategy")

def cumsum_strategy(attrs, inputs, out_type, target):

 """cumsum generic strategy"""

 strategy = _op.OpStrategy()

 strategy.add_implementation(

 wrap_compute_scanop(topi.cumsum),

 wrap_topi_schedule(topi.generic.schedule_extern),

 name="cumsum.generic",

 return strategy

@override_native_generic_func("cumprod_strategy")

def cumprod_strategy(attrs, inputs, out_type, target):

 """cumprod generic strategy"""

 strategy = _op.OpStrategy()

 strategy.add_implementation(

 wrap_compute_scanop(topi.cumprod),

 wrap_topi_schedule(topi.generic.schedule_extern),

 name="cumprod.generic",

 return strategy

@cumsum_strategy.register(["cuda", "gpu"])

def cumsum_strategy_cuda(attrs, inputs, out_type, target):

 """cumsum cuda strategy"""

 strategy = _op.OpStrategy()

 strategy.add_implementation(

 wrap_compute_scanop(topi.cuda.cumsum),

 wrap_topi_schedule(topi.cuda.schedule_scan),

 name="cumsum.cuda",

 return strategy

@cumprod_strategy.register(["cuda", "gpu"])

def cumprod_strategy_cuda(attrs, inputs, out_type, target):

 """cumprod cuda strategy"""

 strategy = _op.OpStrategy()

 strategy.add_implementation(

 wrap_compute_scanop(topi.cuda.cumprod),

 wrap_topi_schedule(topi.cuda.schedule_scan),

 name="cumprod.cuda",

 return strategy

在每个策略中，定义了编写的计算和要在add_implementation()中使用的调度。最后，将该策略与python/tvm/relay/op/_transform.py中定义的Relay算子链接计算：

# cumsum

@_reg.register_compute("cumsum")

def compute_cumsum(attrs, inputs, output_type):

 """Compute definition of cumsum"""

 return [topi.cumsum(inputs[0], attrs.axis, attrs.dtype, attrs.exclusive)]

_reg.register_strategy("cumsum", strategy.cumsum_strategy)

_reg.register_shape_func("cumsum", False, elemwise_shape_func)

# cumprod

@_reg.register_compute("cumprod")

def compute_cumprod(attrs, inputs, output_type):

 """Compute definition of cumprod"""

 return [topi.cumprod(inputs[0], attrs.axis, attrs.dtype, attrs.exclusive)]

_reg.register_strategy("cumprod", strategy.cumprod_strategy)

_reg.register_shape_func("cumprod", False, elemwise_shape_func)

shape函数用于确定给定动态shape张量的输出shape。在这种情况下，告诉TVM输出shape将与输入shape相同。

6. 创建Relay调用节点，开放Python hook

现在有一个工作操作，现在只需要通过Relay调用节点正确地调用。这一步只需要编写一个函数，将参数作为Relay表达式传递给算子，将调用节点返回给算子（即，应该放置在Relay AST中的节点，在该节点中，算子将被调用）。

目前不支持调用属性和类型参数（最后两个字段），因此使用Op:：Get从算子注册表获取算子信息，将参数传递给调用节点就足够了，如下所示。在src/relay/op/tensor/transform.cc中：

Expr MakeCumsum(Expr data, Integer axis, DataType dtype, Bool exclusive) {

 auto attrs = make_objectScanopAttrs();

 attrs-dtype = dtype;

 attrs-axis = axis;

 attrs-exclusive = exclusive;

 static const Op op = Op::Get("cumsum");

 return Call(op, {data}, Attrs(attrs), {});

TVM_REGISTER_GLOBAL("relay.op._make.cumsum").set_body_typed(MakeCumsum);

Expr MakeCumprod(Expr data, Integer axis, DataType dtype, Bool exclusive) {

 auto attrs = make_objectScanopAttrs();

 attrs-dtype = dtype;

 attrs-axis = axis;

 attrs-exclusive = exclusive;

 static const Op op = Op::Get("cumprod");

 return Call(op, {data}, Attrs(attrs), {});

TVM_REGISTER_GLOBAL("relay.op._make.cumsum").set_body_typed(MakeCumprod);

其中TVM_REGISTER_GLOBAL通过relay.op._make.cumsum（…）和relay.op._make.cumsum（…），在Python中开放MakeCumsum和MakeCumprod函数。

7. 包括一个更完整的Python API Hook

通常，Realy中的约定，通过TVM_REGISTER_GLOBAL导出的函数，应该封装在单独的Python函数中，而不是直接在Python中调用。对于算子，在python/tvm/relay/op/transform.py中开放了这个更完整的接口。

def cumsum(data, axis=None, dtype=None, exclusive=None):

 return _make.cumsum(data, axis, dtype, exclusive)

def cumprod(data, axis=None, dtype=None, exclusive=None):

 return _make.cumprod(data, axis, dtype, exclusive)

这些Python包装器也可能是向算子提供更简单界面的好机会。例如，concat算子注册为只使用一个算子，即一个具有要连接的张量的元组，但是Python包装器将张量作为参数，在生成调用节点前，组合成一个元组：

def concat(*args):

 """Concatenate the input tensors along the zero axis.

 Parameters

 ----------

 args: list of Tensor

 Returns

 -------

 tensor: The concatenated tensor.

"""

 tup = Tuple(list(args))

 return _make.concat(tup)

8. 单元测试！

一些单元测试示例，可以在tests/python/relay/test_op_level3.py中找到，用于累积总和与乘积运算。

梯度算子

梯度算子对于编写Relay中的可微程序非常重要。虽然Relay的autodiff算法，可以区分一流的语言结构，但算子是不透明的。由于Relay无法查看实现，因此必须提供明确的差异化规则。

Python和C++都可以编写梯度算子，但是把例子集中在Python上，因为更常用。

在Python中添加梯度

Python梯度算子的集合，可以在Python/tvm/relay/op/_tensor_grad.py中找到。将介绍两个具有代表性的示例：sigmoid和multiply。

@register_gradient("sigmoid")

def sigmoid_grad(orig, grad):

 """Returns [grad * sigmoid(x) * (1 - sigmoid(x))]."""

 return [grad * orig * (ones_like(orig) - orig)]

这里的输入是原始算子orig和要累加的梯度。返回的是一个列表，第i个索引处的元素是算子，相对于算子第i个输入的导数。通常，梯度将返回一个列表，包含的元素数量与基本算子的输入数量相同。

在进一步分析这个定义前，首先应该回顾一下sigmoid函数的导数：

上面的定义看起来类似于数学定义，但有一个重要的补充，将在下面描述。

术语orig*（类似于（orig）-orig）直接匹配导数，因为这里的orig是sigmoid函数，但不只是对如何计算这个函数的梯度感兴趣。感兴趣的是将这个梯度与其它梯度组合起来，这样就可以在整个程序中累积梯度。这就是梯度术语的意义所在。在表达式grad*orig*（one_like（orig）-orig）中，乘以grad指定如何使用到目前为止的梯度合成导数。

现在，考虑乘法，一个稍微有趣的例子：

@register_gradient("multiply")

def multiply_grad(orig, grad):

 """Returns [grad * y, grad * x]"""

 x, y = orig.args

 return [collapse_sum_like(grad * y, x),

 collapse_sum_like(grad * x, y)]

在本例中，返回的列表中有两个元素，因为multiply是一个二进制运算符。回想一下，如果

偏导数是

有一个乘法所需的步骤，对于sigmoid不是必需的，因为乘法具有广播语义。由于梯度的shape可能与输入的shape不匹配，使用collapse\u sum\u来获取梯度*var项的内容，使shape与要区分的输入的shape匹配。

在C++中添加梯度

在C++中添加一个梯度类似于在Python中添加一个，但是用于注册的接口略有不同。

首先，确保包含src/relay/transforms/pattern_utils.h。提供了用于在中继AST中创建节点的帮助器函数。然后，类似于Python示例的方式定义梯度：

tvm::ArrayExpr MultiplyGrad(const Expr orig_call, const Expr output_grad) {

 const Call call = orig_call.DowncastCall();

 return { CollapseSumLike(Multiply(output_grad, call.args[1]), call.args[0]),

 CollapseSumLike(Multiply(output_grad, call.args[0]), call.args[1]) };

在C++中，不能使用Python中的操作符重载，需要进行降维，因此实现更加冗长。即使如此，也可以很容易地验证这个定义，是否反映了Python中的早期示例。

现在，不需要使用Python装饰器，需要在基算子的注册末尾，添加一个对“FPrimalGradient”的set_attr调用，以便注册梯度。

RELAY_REGISTER_OP("multiply")

// ...

// Set other attributes

// ...

.set_attrFPrimalGradient("FPrimalGradient", MultiplyGrad);

参考链接：

https://tvm.apache.org/docs/dev/how_to/relay_add_op.html

人工智能芯片与自动驾驶

内容来源网络，如有侵权，联系删除，本文地址：https://www.230890.com/zhan/103390.html

如何将算子添加到Relay

相关推荐

飞牌,在澳门大赌场出千被抓会怎么样

最新单机游戏排行榜,十大耐玩手机单机游戏有哪些

圆锥体积计算公式,圆柱圆锥全部公式有哪些

c#怎么调用SSIS Package将数据库数据导入

煤气,天然气和煤气有什么区别

PDO连接数据库

分享到：