python目標(biāo)檢測SSD算法訓(xùn)練部分源碼詳解

上傳人：搞*** IP屬地：四川上傳時(shí)間：2025-05-16 格式：DOCX 頁數(shù)：36 大?。?2.85KB 積分：15 舉報(bào) 版權(quán)申訴

已閱讀5頁，還剩31頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

第python目標(biāo)檢測SSD算法訓(xùn)練部分源碼詳解目錄學(xué)習(xí)前言講解構(gòu)架模型訓(xùn)練的流程1、設(shè)置參數(shù)2、讀取數(shù)據(jù)集3、建立ssd網(wǎng)絡(luò)。4、預(yù)處理數(shù)據(jù)集5、框的編碼6、計(jì)算loss值7、訓(xùn)練模型并保存開始訓(xùn)練

學(xué)習(xí)前言

又看了很久的SSD算法，今天講解一下訓(xùn)練部分的代碼。

預(yù)測部分的代碼可以參照/article/246905.htm

講解構(gòu)架

本次教程的講解主要是對(duì)訓(xùn)練部分的代碼進(jìn)行講解，該部分講解主要是對(duì)訓(xùn)練函數(shù)的執(zhí)行過程與執(zhí)行思路進(jìn)行詳解。

訓(xùn)練函數(shù)的執(zhí)行過程大體上分為：

1、設(shè)定訓(xùn)練參數(shù)。

2、讀取數(shù)據(jù)集。

3、建立ssd網(wǎng)絡(luò)。

4、預(yù)處理數(shù)據(jù)集。

5、對(duì)groundtruth實(shí)際框進(jìn)行編碼，使其格式符合神經(jīng)網(wǎng)絡(luò)的預(yù)測結(jié)果，便于比較。

6、計(jì)算loss值。

7、利用優(yōu)化器完成梯度下降并保存模型。

在看本次算法前，建議先下載我簡化過的源碼，配合觀看，具體運(yùn)行方法在開始訓(xùn)練部分

下載鏈接/s/1K4RAJvLj11blywuX2CrLSA

提取碼：4wbi

模型訓(xùn)練的流程

本文使用的ssd_vgg_300的源碼點(diǎn)擊下載，本文對(duì)其進(jìn)行了簡化，保留了上一次篩選出的預(yù)測部分，還加入了訓(xùn)練部分，便于理順整個(gè)SSD的框架。

1、設(shè)置參數(shù)

在載入數(shù)據(jù)庫前，首先要設(shè)定一系列的參數(shù)，這些參數(shù)可以分為幾個(gè)部分。第一部分是SSD網(wǎng)絡(luò)中的一些標(biāo)志參數(shù)：

#===========================================================================#

#SSDNetworkflags.

#===========================================================================#

#localization框的衰減比率

tf.app.flags.DEFINE_float(

'loss_alpha',1.,'Alphaparameterinthelossfunction.')

#正負(fù)樣本比率

tf.app.flags.DEFINE_float(

'negative_ratio',3.,'Negativeratiointhelossfunction.')

#groundtruth處理后，匹配得分高于match_threshold屬于正樣本

tf.app.flags.DEFINE_float(

'match_threshold',0.5,'Matchingthresholdinthelossfunction.')

第二部分是訓(xùn)練時(shí)的參數(shù)（包括訓(xùn)練效果輸出、保存方案等）：

#===========================================================================#

#GeneralFlags.

#===========================================================================#

#train_dir用于保存訓(xùn)練后的模型和日志

tf.app.flags.DEFINE_string(

'train_dir','/tmp/tfmodel/',

'Directorywherecheckpointsandeventlogsarewrittento.')

#num_readers是在對(duì)數(shù)據(jù)集進(jìn)行讀取時(shí)所用的平行讀取器個(gè)數(shù)

tf.app.flags.DEFINE_integer(

'num_readers',4,

'Thenumberofparallelreadersthatreaddatafromthedataset.')

#在進(jìn)行訓(xùn)練batch的構(gòu)建時(shí)，所用的線程數(shù)

tf.app.flags.DEFINE_integer(

'num_preprocessing_threads',4,

'Thenumberofthreadsusedtocreatethebatches.')

#每十步進(jìn)行一次log輸出，在窗口上

tf.app.flags.DEFINE_integer(

'log_every_n_steps',10,

'Thefrequencywithwhichlogsareprint.')

#每600秒存儲(chǔ)一次記錄

tf.app.flags.DEFINE_integer(

'save_summaries_secs',600,

'Thefrequencywithwhichsummariesaresaved,inseconds.')

#每600秒存儲(chǔ)一次模型

tf.app.flags.DEFINE_integer(

'save_interval_secs',600,

'Thefrequencywithwhichthemodelissaved,inseconds.')

#可以使用的gpu內(nèi)存數(shù)量

tf.app.flags.DEFINE_float(

'gpu_memory_fraction',0.7,'GPUmemoryfractiontouse.')

第三部分是優(yōu)化器參數(shù)：

#===========================================================================#

#OptimizationFlags.

#===========================================================================#

#優(yōu)化器參數(shù)

#weight_decay參數(shù)

tf.app.flags.DEFINE_float(

'weight_decay',0.00004,'Theweightdecayonthemodelweights.')

#使用什么優(yōu)化器

tf.app.flags.DEFINE_string(

'optimizer','rmsprop',

'Thenameoftheoptimizer,oneof"adadelta","adagrad","adam",'

'"ftrl","momentum","sgd"or"rmsprop".')

tf.app.flags.DEFINE_float(

'adadelta_rho',0.95,

'Thedecayrateforadadelta.')

tf.app.flags.DEFINE_float(

'adagrad_initial_accumulator_value',0.1,

'StartingvaluefortheAdaGradaccumulators.')

tf.app.flags.DEFINE_float(

'adam_beta1',0.9,

'Theexponentialdecayrateforthe1stmomentestimates.')

tf.app.flags.DEFINE_float(

'adam_beta2',0.999,

'Theexponentialdecayrateforthe2ndmomentestimates.')

tf.app.flags.DEFINE_float('opt_epsilon',1.0,'Epsilontermfortheoptimizer.')

tf.app.flags.DEFINE_float('ftrl_learning_rate_power',-0.5,

'Thelearningratepower.')

tf.app.flags.DEFINE_float(

'ftrl_initial_accumulator_value',0.1,

'StartingvaluefortheFTRLaccumulators.')

tf.app.flags.DEFINE_float(

'ftrl_l1',0.0,'TheFTRLl1regularizationstrength.')

tf.app.flags.DEFINE_float(

'ftrl_l2',0.0,'TheFTRLl2regularizationstrength.')

tf.app.flags.DEFINE_float(

'momentum',0.9,

'ThemomentumfortheMomentumOptimizerandRMSPropOptimizer.')

tf.app.flags.DEFINE_float('rmsprop_momentum',0.9,'Momentum.')

tf.app.flags.DEFINE_float('rmsprop_decay',0.9,'DecaytermforRMSProp.')

第四部分是學(xué)習(xí)率參數(shù)：

#===========================================================================#

#LearningRateFlags.

#===========================================================================#

#學(xué)習(xí)率衰減的方式，有固定、指數(shù)衰減等

tf.app.flags.DEFINE_string(

'learning_rate_decay_type',

'exponential',

'Specifieshowthelearningrateisdecayed.Oneof"fixed","exponential",'

'or"polynomial"')

#初始學(xué)習(xí)率

tf.app.flags.DEFINE_float('learning_rate',0.01,'Initiallearningrate.')

#結(jié)束時(shí)的學(xué)習(xí)率

tf.app.flags.DEFINE_float(

'end_learning_rate',0.0001,

'Theminimalendlearningrateusedbyapolynomialdecaylearningrate.')

tf.app.flags.DEFINE_float(

'label_smoothing',0.0,'Theamountoflabelsmoothing.')

#學(xué)習(xí)率衰減因素

tf.app.flags.DEFINE_float(

'learning_rate_decay_factor',0.94,'Learningratedecayfactor.')

tf.app.flags.DEFINE_float(

'num_epochs_per_decay',2.0,

'Numberofepochsafterwhichlearningratedecays.')

tf.app.flags.DEFINE_float(

'moving_average_decay',None,

'Thedecaytouseforthemovingaverage.'

'IfleftasNone,thenmovingaveragesarenotused.')

第五部分是數(shù)據(jù)集參數(shù)：

#===========================================================================#

#DatasetFlags.

#===========================================================================#

#數(shù)據(jù)集名稱

tf.app.flags.DEFINE_string(

'dataset_name','imagenet','Thenameofthedatasettoload.')

#數(shù)據(jù)集種類個(gè)數(shù)

tf.app.flags.DEFINE_integer(

'num_classes',21,'Numberofclassestouseinthedataset.')

#訓(xùn)練還是測試

tf.app.flags.DEFINE_string(

'dataset_split_name','train','Thenameofthetrain/testsplit.')

#數(shù)據(jù)集目錄

tf.app.flags.DEFINE_string(

'dataset_dir',None,'Thedirectorywherethedatasetfilesarestored.')

tf.app.flags.DEFINE_integer(

'labels_offset',0,

'Anoffsetforthelabelsinthedataset.Thisflagisprimarilyusedto'

'evaluatetheVGGandResNetarchitectureswhichdonotuseabackground'

'classfortheImageNetdataset.')

tf.app.flags.DEFINE_string(

'model_name','ssd_300_vgg','Thenameofthearchitecturetotrain.')

tf.app.flags.DEFINE_string(

'preprocessing_name',None,'Thenameofthepreprocessingtouse.Ifleft'

'as`None`,thenthemodel_nameflagisused.')

#每一次訓(xùn)練batch的大小

tf.app.flags.DEFINE_integer(

'batch_size',32,'Thenumberofsamplesineachbatch.')

#訓(xùn)練圖片的大小

tf.app.flags.DEFINE_integer(

'train_image_size',None,'Trainimagesize')

#最大訓(xùn)練次數(shù)

tf.app.flags.DEFINE_integer('max_number_of_steps',50000,

'Themaximumnumberoftrainingsteps.')

第六部分是微修已有的模型所需的參數(shù)：

#===========================================================================#

#Fine-TuningFlags.

#===========================================================================#

#該部分參數(shù)用于微修已有的模型

#原模型的位置

tf.app.flags.DEFINE_string(

'checkpoint_path',None,

'Thepathtoacheckpointfromwhichtofine-tune.')

tf.app.flags.DEFINE_string(

'checkpoint_model_scope',None,

'Modelscopeinthecheckpoint.Noneifthesameasthetrainedmodel.')

#哪些變量不要

tf.app.flags.DEFINE_string(

'checkpoint_exclude_scopes',None,

'Comma-separatedlistofscopesofvariablestoexcludewhenrestoring'

'fromacheckpoint.')

#那些變量不訓(xùn)練

tf.app.flags.DEFINE_string(

'trainable_scopes',None,

'Comma-separatedlistofscopestofilterthesetofvariablestotrain.'

'Bydefault,Nonewouldtrainallthevariables.')

#忽略丟失的變量

tf.app.flags.DEFINE_boolean(

'ignore_missing_vars',False,

'Whenrestoringacheckpointwouldignoremissingvariables.')

FLAGS=tf.app.flags.FLAGS

所有的參數(shù)的意義我都進(jìn)行了標(biāo)注，在實(shí)際訓(xùn)練的時(shí)候需要修改一些參數(shù)的內(nèi)容，這些參數(shù)看起來多，其實(shí)只是包含了一個(gè)網(wǎng)絡(luò)訓(xùn)練所有必須的部分：

網(wǎng)絡(luò)主體參數(shù)；訓(xùn)練時(shí)的普通參數(shù)（包括訓(xùn)練效果輸出、保存方案等）；優(yōu)化器參數(shù)；學(xué)習(xí)率參數(shù)；數(shù)據(jù)集參數(shù)；微修已有的模型的參數(shù)設(shè)置。

2、讀取數(shù)據(jù)集

在訓(xùn)練流程中，其通過如下函數(shù)讀取數(shù)據(jù)集

##########################讀取數(shù)據(jù)集部分#############################

#選擇數(shù)據(jù)庫

dataset=dataset_factory.get_dataset(

FLAGS.dataset_name,FLAGS.dataset_split_name,FLAGS.dataset_dir)

dataset_factory里面放的是數(shù)據(jù)集獲取和處理的函數(shù)，這里面對(duì)應(yīng)了4個(gè)數(shù)據(jù)集，利用datasets_map存儲(chǔ)了四個(gè)數(shù)據(jù)集的處理代碼。

from__future__importabsolute_import

from__future__importdivision

from__future__importprint_function

fromdatasetsimportcifar10

fromdatasetsimportimagenet

fromdatasetsimportpascalvoc_2007

fromdatasetsimportpascalvoc_2012

datasets_map={

'cifar10':cifar10,

'imagenet':imagenet,

'pascalvoc_2007':pascalvoc_2007,

'pascalvoc_2012':pascalvoc_2012,

defget_dataset(name,split_name,dataset_dir,file_pattern=None,reader=None):

給定一個(gè)數(shù)據(jù)集名和一個(gè)拆分名返回一個(gè)數(shù)據(jù)集。

name:String,數(shù)據(jù)集名稱

split_name:訓(xùn)練還是測試

dataset_dir:存儲(chǔ)數(shù)據(jù)集文件的目錄。

file_pattern:用于匹配數(shù)據(jù)集源文件的文件模式。

reader:tf.readerbase的子類。如果保留為“none”，則使用每個(gè)數(shù)據(jù)集定義的默認(rèn)讀取器。

Returns:

ifnamenotindatasets_map:

raiseValueError('Nameofdatasetunknown%s'%name)

returndatasets_map[name].get_split(split_name,

dataset_dir,

file_pattern,

reader)

我們這里用到pascalvoc_2012的數(shù)據(jù)，所以當(dāng)返回datasets_map[name].get_split這個(gè)代碼時(shí)，實(shí)際上調(diào)用的是：

pascalvoc_2012.get_split(split_name,

dataset_dir,

file_pattern,

reader)

在pascalvoc_2012中g(shù)et_split的執(zhí)行過程如下，其中file_pattern=voc_2012_%s_*.tfrecord，這個(gè)名稱是訓(xùn)練的圖片的默認(rèn)名稱，實(shí)際訓(xùn)練的tfrecord文件名稱像這樣voc_2012_train_001.tfrecord，意味著可以讀取這樣的訓(xùn)練文件：

defget_split(split_name,dataset_dir,file_pattern=None,reader=None):

"""GetsadatasettuplewithinstructionsforreadingImageNet.

Args:

split_name:訓(xùn)練還是測試

dataset_dir:數(shù)據(jù)集的位置

file_pattern:匹配數(shù)據(jù)集源時(shí)要使用的文件模式。

假定模式包含一個(gè)'%s'字符串，以便可以插入拆分名稱

reader:TensorFlow閱讀器類型。

Returns:

數(shù)據(jù)集.

ifnotfile_pattern:

file_pattern=FILE_PATTERN

returnpascalvoc_common.get_split(split_name,dataset_dir,

file_pattern,reader,

SPLITS_TO_SIZES,

ITEMS_TO_DESCRIPTIONS,

NUM_CLASSES)

再進(jìn)入到pascalvoc_common文件后，實(shí)際上就開始對(duì)tfrecord的文件進(jìn)行分割了，通過代碼注釋我們了解代碼的執(zhí)行過程，其中tfrecord的文件讀取就是首先按照keys_to_features的內(nèi)容進(jìn)行文件解碼，解碼后的結(jié)果按照items_to_handlers的格式存入數(shù)據(jù)集：

defget_split(split_name,dataset_dir,file_pattern,reader,

split_to_sizes,items_to_descriptions,num_classes):

"""GetsadatasettuplewithinstructionsforreadingPascalVOCdataset.

給定一個(gè)數(shù)據(jù)集名和一個(gè)拆分名返回一個(gè)數(shù)據(jù)集。

name:String,數(shù)據(jù)集名稱

split_name:訓(xùn)練還是測試

dataset_dir:存儲(chǔ)數(shù)據(jù)集文件的目錄。

file_pattern:用于匹配數(shù)據(jù)集源文件的文件模式。

reader:tf.readerbase的子類。如果保留為“none”，則使用每個(gè)數(shù)據(jù)集定義的默認(rèn)讀取器。

Returns:

ifsplit_namenotinsplit_to_sizes:

raiseValueError('splitname%swasnotrecognized.'%split_name)

#file_pattern是取得的tfrecord數(shù)據(jù)集的位置

file_pattern=os.path.join(dataset_dir,file_pattern%split_name)

#當(dāng)沒有的時(shí)候使用默認(rèn)reader

ifreaderisNone:

reader=tf.TFRecordReader

#VOC數(shù)據(jù)集中的文檔內(nèi)容

keys_to_features={

'image/encoded':tf.FixedLenFeature((),tf.string,default_value=''),

'image/format':tf.FixedLenFeature((),tf.string,default_value='jpeg'),

'image/height':tf.FixedLenFeature([1],64),

'image/width':tf.FixedLenFeature([1],64),

'image/channels':tf.FixedLenFeature([1],64),

'image/shape':tf.FixedLenFeature([3],64),

'image/object/bbox/xmin':tf.VarLenFeature(dtype=tf.float32),

'image/object/bbox/ymin':tf.VarLenFeature(dtype=tf.float32),

'image/object/bbox/xmax':tf.VarLenFeature(dtype=tf.float32),

'image/object/bbox/ymax':tf.VarLenFeature(dtype=tf.float32),

'image/object/bbox/label':tf.VarLenFeature(dtype=64),

'image/object/bbox/difficult':tf.VarLenFeature(dtype=64),

'image/object/bbox/truncated':tf.VarLenFeature(dtype=64),

#解碼方式

items_to_handlers={

'image':slim.tfexample_decoder.Image('image/encoded','image/format'),

'shape':slim.tfexample_decoder.Tensor('image/shape'),

'object/bbox':slim.tfexample_decoder.BoundingBox(

['ymin','xmin','ymax','xmax'],'image/object/bbox/'),

'object/label':slim.tfexample_decoder.Tensor('image/object/bbox/label'),

'object/difficult':slim.tfexample_decoder.Tensor('image/object/bbox/difficult'),

'object/truncated':slim.tfexample_decoder.Tensor('image/object/bbox/truncated'),

#將tfrecord上keys_to_features的部分解碼到items_to_handlers上

decoder=slim.tfexample_decoder.TFExampleDecoder(

keys_to_features,items_to_handlers)

labels_to_names=None

ifdataset_utils.has_labels(dataset_dir):

labels_to_names=dataset_utils.read_label_file(dataset_dir)

returnslim.dataset.Dataset(

data_sources=file_pattern,#數(shù)據(jù)源

reader=reader,#tf.TFRecordReader

decoder=decoder,#解碼結(jié)果

num_samples=split_to_sizes[split_name],#17125

items_to_descriptions=items_to_descriptions,#每一個(gè)item的描述

num_classes=num_classes,#種類

labels_to_names=labels_to_names)

通過上述一系列操作，實(shí)際上是返回了一個(gè)slim.dataset.Dataset數(shù)據(jù)集，而一系列函數(shù)的調(diào)用，實(shí)際上是為了調(diào)用對(duì)應(yīng)的數(shù)據(jù)集。

3、建立ssd網(wǎng)絡(luò)。

建立ssd網(wǎng)絡(luò)的過程并不復(fù)雜，沒有許多函數(shù)的調(diào)用，實(shí)際執(zhí)行過程如果了解ssd網(wǎng)絡(luò)的預(yù)測部分就很好理解，我這里只講下邏輯：

1、利用ssd_class=ssd_vgg_300.SSDNet獲得SSDNet的類

2、替換種類的數(shù)量num_classes參數(shù)

3、利用ssd_net=ssd_class(ssd_params)建立網(wǎng)絡(luò)

4、獲得先驗(yàn)框

調(diào)用的代碼如下：

###########################建立ssd網(wǎng)絡(luò)##############################

#獲得SSD的網(wǎng)絡(luò)和它的先驗(yàn)框

ssd_class=ssd_vgg_300.SSDNet

#替換種類的數(shù)量num_classes參數(shù)

ssd_params=ssd_class.default_params._replace(num_classes=FLAGS.num_classes)

#成功建立了網(wǎng)絡(luò)net，替換參數(shù)

ssd_net=ssd_class(ssd_params)

#獲得先驗(yàn)框

ssd_shape=ssd_net.params.img_shape

ssd_anchors=ssd_net.anchors(ssd_shape)#包括六個(gè)特征層的先驗(yàn)框

4、預(yù)處理數(shù)據(jù)集

預(yù)處理數(shù)據(jù)集的代碼比較長，但是邏輯并不難理解。

1、獲得數(shù)據(jù)集名稱。

2、獲取數(shù)據(jù)集處理的函數(shù)。

3、利用DatasetDataProviders從數(shù)據(jù)集中提供數(shù)據(jù)，進(jìn)行數(shù)據(jù)的預(yù)加載。

4、獲取原始的圖片和它對(duì)應(yīng)的label，框groundtruth的位置

5、預(yù)處理圖片標(biāo)簽和框的位置

具體實(shí)現(xiàn)的代碼如下：

###########################預(yù)處理數(shù)據(jù)集##############################

#preprocessing_name等于ssd_300_vgg

preprocessing_name=FLAGS.preprocessing_nameorFLAGS.model_name

#根據(jù)名字進(jìn)行處理獲得處理函數(shù)

image_preprocessing_fn=preprocessing_factory.get_preprocessing(

preprocessing_name,is_training=True)

#打印參數(shù)

tf_utils.print_configuration(FLAGS.__flags,ssd_params,

dataset.data_sources,FLAGS.train_dir)

#DatasetDataProviders從數(shù)據(jù)集中提供數(shù)據(jù).通過配置，

#可以同時(shí)使用多個(gè)readers或者使用單個(gè)reader提供數(shù)據(jù)。此外，被讀取的數(shù)據(jù)

#可以被打亂順序

#預(yù)加載

with_scope(FLAGS.dataset_name+'_data_provider'):

provider=slim.dataset_data_provider.DatasetDataProvider(

dataset,

num_readers=FLAGS.num_readers,

common_queue_capacity=20*FLAGS.batch_size,

common_queue_min=10*FLAGS.batch_size,

shuffle=True)

#獲取原始的圖片和它對(duì)應(yīng)的label，框groundtruth的位置

[image,_,glabels,gbboxes]=provider.get(['image','shape',

'object/label',

'object/bbox'])

#預(yù)處理圖片標(biāo)簽和框的位置

image,glabels,gbboxes=\

image_preprocessing_fn(image,glabels,gbboxes,

out_shape=ssd_shape,

data_format=DATA_FORMAT)

在這一部分中，可能存在的疑惑的是第二步和第五步，實(shí)際上第五步調(diào)用的就是第二步中的圖像預(yù)處理函數(shù)，所以我們只要看懂第二步獲取數(shù)據(jù)集處理的函數(shù)即可。

獲得處理函數(shù)的代碼是：

#根據(jù)名字進(jìn)行處理獲得處理函數(shù)

image_preprocessing_fn=preprocessing_factory.get_preprocessing(

preprocessing_name,is_training=True)

preprocessing_factory的文件夾內(nèi)存放的都是圖片處理的代碼，在進(jìn)入到get_preprocessing方法后，實(shí)際上會(huì)返回一個(gè)preprocessing_fn函數(shù)。

該函數(shù)的作用實(shí)際上是返回ssd_vgg_preprocessing.preprocess_image處理后的結(jié)果。

而ssd_vgg_preprocessing.preprocess_image實(shí)際上是preprocess_for_train處理后的結(jié)果。

preprocessing_factory的get_preprocessing代碼如下：

defget_preprocessing(name,is_training=False):

preprocessing_fn_map={

'ssd_300_vgg':ssd_vgg_preprocessing

ifnamenotinpreprocessing_fn_map:

raiseValueError('Preprocessingname[%s]wasnotrecognized'%name)

defpreprocessing_fn(image,labels,bboxes,

out_shape,data_format='NHWC',**kwargs):

#這里實(shí)際上調(diào)用ssd_vgg_preprocessing.preprocess_image

returnpreprocessing_fn_map[name].preprocess_image(

image,labels,bboxes,out_shape,data_format=data_format,

is_training=is_training,**kwargs)

returnpreprocessing_fn

ssd_vgg_preprocessing的preprocess_image代碼如下：

defpreprocess_image(image,

labels,

bboxes,

out_shape,

data_format,

is_training=False,

**kwargs):

"""Pre-processangivenimage.

Args:

image:A`Tensor`representinganimageofarbitrarysize.

output_height:預(yù)處理后圖像的高度。

output_width:預(yù)處理后圖像的寬度。

is_training:如果我們正在對(duì)圖像進(jìn)行預(yù)處理以進(jìn)行訓(xùn)練，則為true；否則為false

resize_side_min:圖像最小邊的下界，用于保持方向的大小調(diào)整，

如果“is_training”為“false”，則此值

用于重新縮放

resize_side_max:圖像最小邊的上界，用于保持方向的大小調(diào)整

如果“is_training”為“false”，則此值

用于重新縮放

theresizesideissampledfrom

[resize_size_min,resize_size_max].

Returns:

預(yù)處理后的圖片

ifis_training:

returnpreprocess_for_train(image,labels,bboxes,

out_shape=out_shape,

data_format=data_format)

else:

returnpreprocess_for_eval(image,labels,bboxes,

out_shape=out_shape,

data_format=data_format,

**kwargs)

實(shí)際上最終是通過preprocess_for_train處理數(shù)據(jù)集。

preprocess_for_train處理的過程是：

1、改變數(shù)據(jù)類型。

2、樣本框扭曲。

3、將圖像大小調(diào)整為輸出大小。

4、隨機(jī)水平翻轉(zhuǎn)圖像。

5、隨機(jī)扭曲顏色。有四種方法。

6、圖像減去平均值

執(zhí)行代碼如下：

defpreprocess_for_train(image,labels,bboxes,

out_shape,data_format='NHWC',

scope='ssd_preprocessing_train'):

"""Preprocessesthegivenimagefortraining.

Notethattheactualresizingscaleissampledfrom

[`resize_size_min`,`resize_size_max`].

image:圖片，任意size的圖片.

output_height:處理后的圖片高度.

output_width:處理后的圖片寬度.

resize_side_min:圖像最小邊的下界，用于保方面調(diào)整大小

resize_side_max:圖像最小邊的上界，用于保方面調(diào)整大小

Returns:

處理過的圖片

fast_mode=False

with_scope(scope,'ssd_preprocessing_train',[image,labels,bboxes]):

ifimage.get_shape().ndims!=3:

raiseValueError('Inputmustbeofsize[height,width,C0]')

#改變圖片的數(shù)據(jù)類型

ifimage.dtype!=tf.float32:

image=tf.image.convert_image_dtype(image,dtype=tf.float32)

#樣本框扭曲

dst_image=image

dst_image,labels,bboxes,_=\

distorted_bounding_box_crop(image,labels,bboxes,

min_object_covered=MIN_OBJECT_COVERED,

aspect_ratio_range=CROP_RATIO_RANGE)

#將圖像大小調(diào)整為輸出大小。

dst_image=tf_image.resize_image(dst_image,out_shape,

method=tf.image.ResizeMethod.BILINEAR,

align_corners=False)

#隨機(jī)水平翻轉(zhuǎn)圖像.

dst_image,bboxes=tf_image.random_flip_left_right(dst_image,bboxes)

#隨機(jī)扭曲顏色。有四種方法.

dst_image=apply_with_random_selector(

dst_image,

lambdax,ordering:distort_color(x,ordering,fast_mode),

num_cases=4)

#圖像減去平均值

image=dst_image*255.

image=tf_image_whitened(image,[_R_MEAN,_G_MEAN,_B_MEAN])

#圖像的類型

ifdata_format=='NCHW':

image=tf.transpose(image,perm=(2,0,1))

returnimage,labels,bboxes

5、框的編碼

該部分利用如下代碼調(diào)用框的編碼代碼：

gclasses,glocalisations,gscores=ssd_net.bboxes_encode(glabels,gbboxes,ssd_anchors)

實(shí)際上bboxes_encode方法中，調(diào)用的是ssd_common模塊中的tf_ssd_bboxes_encode。

defbboxes_encode(self,labels,bboxes,anchors,

scope=None):

進(jìn)行編碼操作

returnssd_common.tf_ssd_bboxes_encode(

labels,bboxes,anchors,

self.params.num_classes,

self.params.no_annotation_label,

ignore_threshold=0.5,

prior_scaling=self.params.prior_scaling,

scope=scope)

ssd_common.tf_ssd_bboxes_encode執(zhí)行的代碼是對(duì)特征層每一層進(jìn)行編碼操作。

deftf_ssd_bboxes_encode(labels,

bboxes,

anchors,

num_classes,

no_annotation_label,

ignore_threshold=0.5,

prior_scaling=[0.1,0.1,0.2,0.2],

dtype=tf.float32,

scope='ssd_bboxes_encode'):

對(duì)每一個(gè)特征層進(jìn)行解碼

with_scope(scope):

target_labels=[]

target_localizations=[]

target_scores=[]

fori,anchors_layerinenumerate(anchors):

with_scope('bboxes_encode_block_%i'%i):

t_labels,t_loc,t_scores=\

tf_ssd_bboxes_encode_layer(labels,bboxes,anchors_layer,

num_classes,no_annotation_label,

ignore_threshold,

prior_scaling,dtype)

target_labels.append(t_labels)

target_localizations.append(t_loc)

target_scores.append(t_scores)

returntarget_labels,target_localizations,target_scores

實(shí)際上具體解碼的操作在函數(shù)tf_ssd_bboxes_encode_layer里，tf_ssd_bboxes_encode_layer解碼的思路是：

1、創(chuàng)建一系列變量用于存儲(chǔ)編碼結(jié)果。

yref,xref,href,wref=anchors_layer

ymin=yref-href/2.

xmin=xref-wref/2.

ymax=yref+href/2.

xmax=xref+wref/2.

vol_anchors=(xmax-xmin)*(ymax-ymin)

#1、創(chuàng)建一系列變量存儲(chǔ)編碼結(jié)果

#每個(gè)特征層的shape

shape=(yref.shape[0],yref.shape[1],href.size)

#每個(gè)特征層特定點(diǎn)，特定框的label

feat_labels=tf.zeros(shape,dtype=64)#(m,m,k)

#每個(gè)特征層特定點(diǎn)，特定框的得分

feat_scores=tf.zeros(shape,dtype=dtype)

#每個(gè)特征層特定點(diǎn)，特定框的位置

feat_ymin=tf.zeros(shape,dtype=dtype)

feat_xmin=tf.zeros(shape,dtype=dtype)

feat_ymax=tf.ones(shape,dtype=dtype)

feat_xmax=tf.ones(shape,dtype=dtype)

2、對(duì)所有的實(shí)際框都尋找其在特征層中對(duì)應(yīng)的點(diǎn)與其對(duì)應(yīng)的框，并將其標(biāo)簽找到。

#用于計(jì)算IOU

defjaccard_with_anchors(bbox):

int_ymin=tf.maximum(ymin,bbox[0])#(m,m,k)

int_xmin=tf.maximum(xmin,bbox[1])

int_ymax=tf.minimum(ymax,bbox[2])

int_xmax=tf.minimum(xmax,bbox[3])

h=tf.maximum(int_ymax-int_ymin,0.)

w=tf.maximum(int_xmax-int_xmin,0.)

#Volumes.

#處理搜索框和bbox之間的聯(lián)系

inter_vol=h*w#交集面積

union_vol=vol_anchors-inter_vol\

+(bbox[2]-bbox[0])*(bbox[3]-bbox[1])#并集面積

jaccard=tf.div(inter_vol,union_vol)#交集/并集，即IOU

returnjaccard#(m,m,k)

defcondition(i,feat_labels,feat_scores,

feat_ymin,feat_xmin,feat_ymax,feat_xmax):

r=tf.less(i,tf.shape(labels))

returnr[0]

#該部分用于尋找實(shí)際中的框?qū)?yīng)特征層的哪個(gè)框

defbody(i,feat_labels,feat_scores,

feat_ymin,feat_xmin,feat_ymax,feat_xmax):

更新功能標(biāo)簽、分?jǐn)?shù)和bbox。

-JacCard0.5時(shí)賦值；

#取出第i個(gè)標(biāo)簽和第i個(gè)bboxes

label=labels[i]#當(dāng)前圖片上第i個(gè)對(duì)象的標(biāo)簽

bbox=bboxes[i]#當(dāng)前圖片上第i個(gè)對(duì)象的真實(shí)框bbox

#計(jì)算該box和所有anchor_box的IOU

jaccard=jaccard_with_anchors(bbox)#當(dāng)前對(duì)象的bbox和當(dāng)前層的搜索網(wǎng)格IOU

#所有高于歷史的分的box被篩選

mask=tf.greater(jaccard,feat_scores)#掩碼矩陣，IOU大于歷史得分的為True

mask=tf.logical_and(mask,feat_scores-0.5)

imask=tf.cast(mask,64)#[1,0,1,1,0]

fmask=tf.cast(mask,dtype)#[1.,0.,1.,0....]

#Updatevaluesusingmask.

#保證feat_labels存儲(chǔ)對(duì)應(yīng)位置得分最大對(duì)象標(biāo)簽，feat_scores存儲(chǔ)那個(gè)得分

#(m,m,k)×當(dāng)前類別+(1-(m,m,k))×(m,m,k)

#更新label記錄，此時(shí)的imask已經(jīng)保證了True位置當(dāng)前對(duì)像得分高于之前的對(duì)象得分，其他位置值不變

#將所有被認(rèn)為是label的框的值賦予feat_labels

feat_labels=imask*label+(1-imask)*feat_labels

#用于尋找最匹配的框

feat_scores=tf.where(mask,jaccard,feat_scores)

#下面四個(gè)矩陣存儲(chǔ)對(duì)應(yīng)label的真實(shí)框坐標(biāo)

#(m,m,k)×當(dāng)前框坐標(biāo)scalar+(1-(m,m,k))×(m,m,k)

feat_ymin=fmask*bbox[0]+(1-fmask)*feat_ymin

feat_xmin=fmask*bbox[1]+(1-fmask)*feat_xmin

feat_ymax=fmask*bbox[2]+(1-fmask)*feat_ymax

feat_xmax=fmask*bbox[3]+(1-fmask)*feat_xmax

return[i+1,feat_labels,feat_scores,

feat_ymin,feat_xmin,feat_ymax,feat_xmax]

i=0

#2、對(duì)所有的實(shí)際框都尋找其在特征層中對(duì)應(yīng)的點(diǎn)與其對(duì)應(yīng)的框，并將其標(biāo)簽找到。

(i,feat_labels,feat_scores,feat_ymin,feat_xmin,

feat_ymax,feat_xmax)=tf.while_loop(condition,body,

feat_labels,feat_scores,

feat_ymin,feat_xmin,

feat_ymax,feat_xmax])

3、轉(zhuǎn)化成ssd中網(wǎng)絡(luò)的輸出格式。

#Transformtocenter/size.

#3、轉(zhuǎn)化成ssd中網(wǎng)絡(luò)的輸出格式。

feat_cy=(feat_ymax+feat_ymin)/2.

feat_cx=(feat_xmax+feat_xmin)/2.

feat_h=feat_ymax-feat_ymin

feat_w=feat_xmax-feat_xmin

#Encodefeatures.

#利用公式進(jìn)行計(jì)算

#以搜索網(wǎng)格中心點(diǎn)為參考，真實(shí)框中心的偏移，單位長度為網(wǎng)格hw

feat_cy=(feat_cy-yref)/href/prior_scaling[0]

feat_cx=(feat_cx-xref)/wref/prior_scaling[1]

#log((m,m,k)/(m,m,1))*5

#真實(shí)框?qū)捀?搜索網(wǎng)格寬高，取對(duì)

feat_h=tf.log(feat_h/href)/prior_scaling[2]

feat_w=tf.log(feat_w/wref)/prior_scaling[3]

#UseSSDordering:x/y/w/hinsteadofours.(m,m,k,4)

feat_localizations=tf.stack([feat_cx,feat_cy,feat_w,feat_h],axis=-1)

returnfeat_labels,feat_localizations,feat_scores

真實(shí)情況下的標(biāo)簽和框在編碼完成后，格式與經(jīng)過網(wǎng)絡(luò)預(yù)測出的標(biāo)簽與框相同，此時(shí)才可以計(jì)算loss進(jìn)行對(duì)比。

6、計(jì)算loss值

通過第五步獲得的框的編碼后的scores和locations指的是數(shù)據(jù)集標(biāo)注的結(jié)果，是真實(shí)情況。而計(jì)算loss值還需要預(yù)測情況。

通過如下代碼可以獲得每個(gè)image的預(yù)測情況，將圖片通過網(wǎng)絡(luò)進(jìn)行預(yù)測：

#設(shè)置SSD網(wǎng)絡(luò)的參數(shù)

arg_scope=ssd_net.arg_scope(weight_decay=FLAGS.weight_decay,

data_format=DATA_FORMAT)

#將圖片經(jīng)過網(wǎng)絡(luò)獲得它們的框的位置和prediction

withslim.arg_scope(arg_scope):

_,localisations,logits,_=\

ssd_(b_image,is_training=True)

再調(diào)用loss計(jì)算函數(shù)計(jì)算三個(gè)loss值，分別對(duì)應(yīng)正樣本，負(fù)樣本，定位。

#計(jì)算loss值

n_positives_loss,n_negative_loss,localization_loss=ssd_net.losses(logits,localisations,

b_gclasses,b_glocalisations,b_gscores,

match_threshold=FLAGS.match_threshold,

negative_ratio=FLAGS.negative_ratio,

alpha=FLAGS.loss_alpha,

label_smoothing=FLAGS.label_smoothing)

#會(huì)得到三個(gè)loss值，分別對(duì)應(yīng)正樣本，負(fù)樣本，定位

loss_all=n_positives_loss+n_negative_loss+localization_loss

ssd_net.losses中，具體通過如下方式進(jìn)行損失值的計(jì)算。

1、對(duì)所有的圖片進(jìn)行鋪平，將其種類預(yù)測的轉(zhuǎn)化為(,num_classes)，框預(yù)測的格式轉(zhuǎn)化為(,4)，實(shí)際種類和實(shí)際得分的格式轉(zhuǎn)化為()，該步可以便于后面的比較與處理。最后將batch個(gè)圖片平鋪到同一表上。

2、在gscores中得到滿足正樣本得分的pmask正樣本，不滿足正樣本得分的為nmask負(fù)樣本，因?yàn)槭褂玫氖莋scores，我們可以知道正樣本負(fù)樣本分類是針對(duì)真實(shí)值的。

3、將不滿足正樣本的位置設(shè)成對(duì)應(yīng)prediction中背景的得分，其它設(shè)為1。

4、找到n_neg個(gè)最不可能為背景的點(diǎn)（實(shí)際上它是背景，這樣利用二者計(jì)算的loss就很大）

5、分別計(jì)算正樣本、負(fù)樣本、框的位置的交叉熵。

defssd_losses(logits,localisations,

gclasses,glocalisations,gscores,

match_threshold=0.5,

negative_ratio=3.,

alpha=1.,

label_smoothing=0.,

device='/cpu:0',

scope=None):

with_scope(scope,'ssd_losses'):

lshape=tfe.get_shape(logits[0],5)

num_classes=lshape[-1]

batch_size=lshape[0]

#鋪平所有vector

flogits=[]

fgclasses=[]

fgscores=[]

flocalisations=[]

fglocalisations=[]

foriinrange(len(logits)):#按照圖片循環(huán)

flogits.append(tf.reshape(logits[i],[-1,num_classes]))

fgclasses.append(tf.reshape(gclasses[i],[-1]))

fgscores.append(tf.reshape(gscores[i],[-1]))

flocalisations.append(tf.reshape(localisations[i],[-1,4]))

fglocalisations.append(tf.reshape(glocalisations[i],[-1,4]))

#上一步所得的還存在batch個(gè)行里面，對(duì)應(yīng)batch個(gè)圖片

#這一步將batch個(gè)圖片平鋪到同一表上

logits=tf.concat(flogits,axis=0)

gclasses=tf.concat(fgclasses,axis=0)

gscores=tf.concat(fgscores,axis=0)

localisations=tf.concat(flocalisations,axis=0)

glocalisations=tf.concat(fglocalisations,axis=0)

dtype=logits.dtype

#gscores中滿足正樣本得分的mask

pmask=gscoresmatch_threshold

fpmask=tf.cast(pmask,dtype)

no_classes=tf.cast(pmask,32)

nmask=tf.logical_and(tf.logical_not(pmask),#IOU達(dá)不到閾值的類別搜索框位置記1

gscores-0.5)

fnmask=tf.cast(nmask,dtype)

n_positives=tf.reduce_sum(fpmask)

#將預(yù)測結(jié)果轉(zhuǎn)化成比率

predictions=slim.softmax(logits)

nvalues=tf.where(nmask,

predictions[:,0],#框內(nèi)無物體標(biāo)記為背景預(yù)測概率

1.-fnmask)#框內(nèi)有物體位置標(biāo)記為1

nvalues_flat=tf.reshape(nvalues,[-1])

#max_neg_entries為實(shí)際上負(fù)樣本的個(gè)數(shù)

max_neg_entries=tf.cast(tf.reduce_sum(fnmask),32)

#n_neg為正樣本的個(gè)數(shù)*3+batch_size,之所以+batchsize是因?yàn)槊總€(gè)圖最少有一個(gè)負(fù)樣本背景

n_neg=tf.cast(negative_ratio*n_positives,32)+batch_size

n_neg=tf.minimum(n_neg,max_neg_entries)

#找到n_neg個(gè)最不可能為背景的點(diǎn)

val,idxes=tf.nn.top_k(-nvalues_flat,k=n_neg)

max_hard_pred

人人文庫> 全部分類> 辦公材料 > 辦公文檔

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

python目標(biāo)檢測SSD算法訓(xùn)練部分源碼詳解

文檔簡介

溫馨提示

最新文檔

評(píng)論

python目標(biāo)檢測SSD算法訓(xùn)練部分源碼詳解

文檔簡介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔