# 引言

Pytorch——业界最受欢迎的AI库之一。其易用性和强大而丰富的功能都使得其广受人们的好评。由于篇幅所限，本文主要集中注意力于torch中的tensor相关操作的实现和torch.nn内各个模块之间的关系，对这些代码展开分析。

我们的分析主要将分为以下几个部分，围绕着以下几个问题来展开：

1. pytorch是如何基于python实现tensor的？tensor是怎么存储的？tensor是如何实现操作（如矩阵乘、reshape、transpose等）的？
2. tensor是如何与其他模块联动的？如何实现前向传播的参与、自动求导和反向更新？
3. torch/nn路径下的各个模块的组织结构是怎样的？每一个模块如何实现以方便管理、修改和调用？
4. 当我们调用其中的模块的时候究竟发生了什么？
5. nn.Module是如何支持可扩展性的？当我们在自己的神经网络中增加某一层的时候具体发生了什么？

由于我们是面向对象课，因此我们的关注点将主要集中于问题2、3、5，其余问题将当做扩展放于最后。

让我们以一段简单的代码开始。这段代码包含了两个线性层和一个relu层，以SGD作优化器和MSELoss计算损失，并进行了一轮前向传播和反向传播：

```python
import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(10, 50)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(50, 1)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Create an instance of the neural network
model = SimpleNN()

# Define a loss function and an optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Dummy input and target tensors
input_tensor = torch.randn(1, 10)
target_tensor = torch.randn(1, 1)

# Forward pass
output = model(input_tensor)
loss = criterion(output, target_tensor)

# Backward pass and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()

print(f'Output: {output}')
print(f'Loss: {loss.item()}')
```

让我们以debug的形式一步一步地看看它发生了什么：

1. 在simpleNN类的初始化时，其先调用其父类nn.Module的初始化函数，调用地点位于torch/nn/modules/module.py中的Module类的\_\_init\_\_函数：

```python
        super(SimpleNN, self).__init__()
```

```python
def __init__(self, *args, **kwargs) -> None:
        """Initialize internal Module state, shared by both nn.Module and ScriptModule."""
        torch._C._log_api_usage_once("python.nn_module")

        # Backward compatibility: no args used to be allowed when call_super_init=False
        if self.call_super_init is False and bool(kwargs):
            raise TypeError(f"{type(self).__name__}.__init__() got an unexpected keyword argument '{next(iter(kwargs))}'"
                            "")

        if self.call_super_init is False and bool(args):
            raise TypeError(f"{type(self).__name__}.__init__() takes 1 positional argument but {len(args) + 1} were"
                            " given")

        """
        Calls super().__setattr__('a', a) instead of the typical self.a = a
        to avoid Module.__setattr__ overhead. Module's __setattr__ has special
        handling for parameters, submodules, and buffers but simply calls into
        super().__setattr__ for all other attributes.
        """
        super().__setattr__('training', True)
        super().__setattr__('_parameters', OrderedDict())
        super().__setattr__('_buffers', OrderedDict())
        super().__setattr__('_non_persistent_buffers_set', set())
        super().__setattr__('_backward_pre_hooks', OrderedDict())
        super().__setattr__('_backward_hooks', OrderedDict())
        super().__setattr__('_is_full_backward_hook', None)
        super().__setattr__('_forward_hooks', OrderedDict())
        super().__setattr__('_forward_hooks_with_kwargs', OrderedDict())
        super().__setattr__('_forward_hooks_always_called', OrderedDict())
        super().__setattr__('_forward_pre_hooks', OrderedDict())
        super().__setattr__('_forward_pre_hooks_with_kwargs', OrderedDict())
        super().__setattr__('_state_dict_hooks', OrderedDict())
        super().__setattr__('_state_dict_pre_hooks', OrderedDict())
        super().__setattr__('_load_state_dict_pre_hooks', OrderedDict())
        super().__setattr__('_load_state_dict_post_hooks', OrderedDict())
        super().__setattr__('_modules', OrderedDict())

        if self.call_super_init:
            super().__init__(*args, **kwargs)
```

nn.Module内包含了很多变量在初始时声明，在此通过调用变量的上层类的\_\_setattr\_\_方法去初始化其值。

2. 初始化对应层的类，让我们以nn.Linear来展开分析，其同样位于modules文件夹中，文件名为Linear，规模为百行。

```python
        self.fc1 = nn.Linear(10, 50)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(50, 1)
```

```python
    def __init__(self, in_features: int, out_features: int, bias: bool = True,
                 device=None, dtype=None) -> None:
        factory_kwargs = {'device': device, 'dtype': dtype}
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
        if bias:
            self.bias = Parameter(torch.empty(out_features, **factory_kwargs))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()
```

Linear同样是Module的子类，在其之上添加了in\_features : int和out\_feature : int和weight : Tensor。若传参时bias == True，则添加bias : Tensor类型。

```python
    def reset_parameters(self) -> None:
        # Setting a=sqrt(5) in kaiming_uniform is the same as initializing with
        # uniform(-1/sqrt(in_features), 1/sqrt(in_features)). For details, see
        # https://github.com/pytorch/pytorch/issues/57109
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
            init.uniform_(self.bias, -bound, bound)
```

当初始化参数结束时调用reset\_parameters方法，将weight按照kaiming\_uniform分布进行初始化，将bias按照uniform分布进行初始化，防止初始化为0可能导致的难以到达最小值、神经元死亡等问题。

3. 初始化criterion，MSEloss同样是Module的子类，过程和上面比较类似，此处从略。
4. 初始化optimizer为SGD实例，该类位于torch/optim/SGD路径下，在此我们调用model.parameters()方法将神经网络参数传入optimizer，以便以后的梯度计算和参数更新。

```python
optimizer = optim.SGD(model.parameters(), lr=0.01)
```

5. 前向传播获得output：在此直接给model传参数会使得控制来到Module模块的`_wrapped_call_impl`进而进入`_call_impl`函数，它使得控制回到自定义模块中的forward方法。类似地，criterion()因为是Module子类MSELoss的实例化因此它也可以以类似的方法调用forward函数。

```
output = model(input_tensor)
loss = criterion(output, target_tensor)
```

```python
    __call__: Callable[..., Any] = _wrapped_call_impl
    def _wrapped_call_impl(self, *args, **kwargs):
        if self._compiled_call_impl is not None:
            return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
        else:
            return self._call_impl(*args, **kwargs)
```

```python
    def _call_impl(self, *args, **kwargs):
        forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.forward)
        # If we don't have any hooks, we want to skip the rest of the logic in
        # this function, and just call forward.
        if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
                or _global_backward_pre_hooks or _global_backward_hooks
                or _global_forward_hooks or _global_forward_pre_hooks):
            return forward_call(*args, **kwargs)
```

6. 反向传播：调用对应模块内定义好的函数，此处不再解释。

```python
optimizer.zero_grad()
loss.backward()
optimizer.step()
```

从上面的例子中，我们不难看到，我们最常用的模块是以torch.nn.Module类为核心，将不同类型的模块作为Module类的子类进行扩展。作为父类，Module类似容器，需要保证参数的装载、子类方法的调用等；作为子类，Linear、ReLu等对Module进行进一步的包装，保证了正确的初始化、正确的前向传播等。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://willlin22s-organization.gitbook.io/pytorch-learning-report/yin-yan.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
