<tfoot id='Zkl6v'></tfoot>

      • <bdo id='Zkl6v'></bdo><ul id='Zkl6v'></ul>

      <small id='Zkl6v'></small><noframes id='Zkl6v'>

      <i id='Zkl6v'><tr id='Zkl6v'><dt id='Zkl6v'><q id='Zkl6v'><span id='Zkl6v'><b id='Zkl6v'><form id='Zkl6v'><ins id='Zkl6v'></ins><ul id='Zkl6v'></ul><sub id='Zkl6v'></sub></form><legend id='Zkl6v'></legend><bdo id='Zkl6v'><pre id='Zkl6v'><center id='Zkl6v'></center></pre></bdo></b><th id='Zkl6v'></th></span></q></dt></tr></i><div id='Zkl6v'><tfoot id='Zkl6v'></tfoot><dl id='Zkl6v'><fieldset id='Zkl6v'></fieldset></dl></div>
      <legend id='Zkl6v'><style id='Zkl6v'><dir id='Zkl6v'><q id='Zkl6v'></q></dir></style></legend>

      1. 培训有效-使用PyTorch和TorchVision测试自定义数据集的拆分

        时间:2024-08-11
              <tbody id='oFZ0U'></tbody>
            <legend id='oFZ0U'><style id='oFZ0U'><dir id='oFZ0U'><q id='oFZ0U'></q></dir></style></legend><tfoot id='oFZ0U'></tfoot>
          • <i id='oFZ0U'><tr id='oFZ0U'><dt id='oFZ0U'><q id='oFZ0U'><span id='oFZ0U'><b id='oFZ0U'><form id='oFZ0U'><ins id='oFZ0U'></ins><ul id='oFZ0U'></ul><sub id='oFZ0U'></sub></form><legend id='oFZ0U'></legend><bdo id='oFZ0U'><pre id='oFZ0U'><center id='oFZ0U'></center></pre></bdo></b><th id='oFZ0U'></th></span></q></dt></tr></i><div id='oFZ0U'><tfoot id='oFZ0U'></tfoot><dl id='oFZ0U'><fieldset id='oFZ0U'></fieldset></dl></div>
            • <bdo id='oFZ0U'></bdo><ul id='oFZ0U'></ul>

            • <small id='oFZ0U'></small><noframes id='oFZ0U'>

                  本文介绍了培训有效-使用PyTorch和TorchVision测试自定义数据集的拆分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我有一些用于二进制分类任务的图像数据,图像被组织到两个文件夹中,即data/model_data/class-A和data/model_data/class-B。

                  总共有N个图像。我想要一张70/20/10的平分票,火车/瓦尔/考试。 我正在使用PyTorch和Torchvision来完成任务。以下是我到目前为止拥有的代码。

                  from torch.utils.data import Dataset, DataLoader
                  from torchvision import transforms, utils, datasets, models
                  
                  data_transform = transforms.Compose([
                      transforms.RandomResizedCrop(224),
                      transforms.RandomHorizontalFlip(),
                      transforms.ToTensor(),
                      transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
                  
                  model_dataset = datasets.ImageFolder(root, transform=data_transform) 
                  train_count = int(0.7 * total_count) 
                  valid_count = int(0.2 * total_count)
                  test_count = total_count - train_count - valid_count
                  train_dataset, valid_dataset, test_dataset = torch.utils.data.random_split(model_dataset, (train_count, valid_count, test_count))
                  train_dataset_loader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=NUM_WORKER)  
                  valid_dataset_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=NUM_WORKER) 
                  test_dataset_loader  = torch.utils.data.DataLoader(test_dataset , batch_size=BATCH_SIZE, shuffle=False,num_workers=NUM_WORKER)
                  dataloaders = {'train': train_dataset_loader, 'val': valid_dataset_loader, 'test': test_dataset_loader}
                  

                  我觉得这不是正确的做法,原因有两个。

                  • 我正在对所有拆分应用相同的转换。(很明显,这不是我想做的!此问题的解决方案很可能是答案here。)
                  • 通常人们先将原始数据分成测试/训练,然后 将列车分离为列车/VAL,而我直接将 原始数据进入列车/VAL/测试。(这是正确的吗?)
                  所以,我的问题是,我做的是正确的吗?(可能不会)
                  如果不正确,我如何着手编写数据加载器以实现所需的拆分,以便可以将单独的转换应用于每个列车/测试/VAL?

                  推荐答案

                  通常人们首先将原始数据分成测试/训练和 然后他们把火车分成火车/瓦尔,而我直接 将原始数据分成列车/VAL/测试。(这是正确的吗?)

                  是的,它是完全正确的,可读的,总体上是完好无损的

                  我正在对所有拆分应用相同的转换。(这不是什么 很明显,我想这么做!这个问题的解决方案很可能是 在此回答。)

                  是的,答案是有可能的,但它毫无意义地冗长乏味。您可以使用第三方工具torchdata,只需使用:

                  即可安装
                  pip install torchdata
                  

                  可以找到文档here(另有免责声明:我是作者)。

                  它允许您轻松地将转换映射到任何torch.utils.data.Dataset(在本例中映射到train)。您的代码将如下所示(只需更改两行,检查注释,并格式化代码以使其更容易跟上):

                  import torch
                  import torchvision
                  
                  import torchdata as td
                  
                  data_transform = torchvision.transforms.Compose(
                      [
                          torchvision.transforms.RandomResizedCrop(224),
                          torchvision.transforms.RandomHorizontalFlip(),
                          torchvision.transforms.ToTensor(),
                          torchvision.transforms.Normalize(
                              mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
                          ),
                      ]
                  )
                  
                  # Single change, makes an instance of torchdata.Dataset
                  # Works just like PyTorch's torch.utils.data.Dataset, but has
                  # additional capabilities like .map, cache etc., see project's description
                  model_dataset = td.datasets.WrapDataset(torchvision.datasets.ImageFolder(root))
                  # Also you shouldn't use transforms here but below
                  train_count = int(0.7 * total_count)
                  valid_count = int(0.2 * total_count)
                  test_count = total_count - train_count - valid_count
                  train_dataset, valid_dataset, test_dataset = torch.utils.data.random_split(
                      model_dataset, (train_count, valid_count, test_count)
                  )
                  
                  # Apply transformations here only for train dataset
                  
                  train_dataset = train_dataset.map(data_transform)
                  
                  # Rest of the code goes the same
                  
                  train_dataset_loader = torch.utils.data.DataLoader(
                      train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=NUM_WORKER
                  )
                  valid_dataset_loader = torch.utils.data.DataLoader(
                      valid_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=NUM_WORKER
                  )
                  test_dataset_loader = torch.utils.data.DataLoader(
                      test_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=NUM_WORKER
                  )
                  dataloaders = {
                      "train": train_dataset_loader,
                      "val": valid_dataset_loader,
                      "test": test_dataset_loader,
                  }
                  

                  是的,我同意在拆分前指定transform不太清楚,而且我认为这更具可读性。

                  这篇关于培训有效-使用PyTorch和TorchVision测试自定义数据集的拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:运行时错误:cuDNN错误:CUDNN_STATUS_NOT_使用pytorch初始化 下一篇:触摸板(PyTorch)ADD_GRAPH中出错

                  相关文章

                • <legend id='Hj46S'><style id='Hj46S'><dir id='Hj46S'><q id='Hj46S'></q></dir></style></legend>

                      <bdo id='Hj46S'></bdo><ul id='Hj46S'></ul>

                    1. <i id='Hj46S'><tr id='Hj46S'><dt id='Hj46S'><q id='Hj46S'><span id='Hj46S'><b id='Hj46S'><form id='Hj46S'><ins id='Hj46S'></ins><ul id='Hj46S'></ul><sub id='Hj46S'></sub></form><legend id='Hj46S'></legend><bdo id='Hj46S'><pre id='Hj46S'><center id='Hj46S'></center></pre></bdo></b><th id='Hj46S'></th></span></q></dt></tr></i><div id='Hj46S'><tfoot id='Hj46S'></tfoot><dl id='Hj46S'><fieldset id='Hj46S'></fieldset></dl></div>

                      <small id='Hj46S'></small><noframes id='Hj46S'>

                      <tfoot id='Hj46S'></tfoot>