SRPAR: anchor-free detector with aspect ratio priority for slender objects

Abstract. Slender objects are more difficult to detect than conventional objects with regular shapes, and the rotation of these objects makes detecting them even more challenging. To address these challenges, we proposed SRPAR, an anchor-free detector with aspect ratio priority for slender objects. The aspect ratio priority factor is designed based on the object’s aspect ratio and rotation angle. The aspect ratio priority factor guides the regression of slender rotating objects and improves the regression accuracy. In addition, through the multi-level prediction of feature pyramid network, the range of bounding boxes and corresponding angular regressions at each level is limited, and the regression of overlapping object prediction boxes is accelerated. To better evaluate the SRPAR’s detection performance of slender rotating objects having an aspect ratio of at least 3:1, some images of sticks are supplemented into the baseball bat subset of the common objects in context (COCO) dataset to form a new self-made COCO-Stick dataset. Experimental results on the dataset of object detection in aerial images dataset and the self-made COCO-Stick dataset show that, compared with state-of-the-art detectors, the proposed method has some advantages in detection accuracy.


Introduction
With the rapid development of deep learning, many high-performance general object detection methods have been proposed. However, the current general detectors are no longer sufficient for detecting some specific types of objects, such as sticks and other slender objects with aspect ratios of at least 3:1. Figure 1 shows a stick-like object from the common objects in context (COCO) 1 dataset, and the angle between the slender object and the horizontal ground is represented by θ. Regarding slender objects with large aspect ratios, Tian et al. 2 noted that, even with careful design, because the scales and aspect ratios of anchor boxes are kept fixed, detectors encounter difficulties when dealing with object candidates with large shape variations. Therefore, anchor-free object detectors may be better at detecting objects with large shape variations like slender objects. However, an anchor-free detector usually detects an object by locating the key points (corner points, extreme points, and center points) on that object. During the detection process, the key points on the object inevitably fall into multiple bounding boxes, and the network has difficulties in choosing which bounding box to return. The location that causes bounding box overlap is defined as a fuzzy sample. 2 In addition, the aspect ratios of objects in datasets at home and abroad are mostly between 1:1 and 3:1, which does not match the requirement of slender object detection. As a result, the main problems of anchor-free slender object detection include large aspect ratios, overlapping bounding boxes, and a lack of suitable datasets. *Address all correspondence to Hong-Gang Xie, xiehg@hbut.edu.cn; Ming Yang, ymut@hbut.edu.cn Many anchor-free object detectors have been proposed in recent years. Law and Deng proposed CornerNet, 3 which determines an anchor box by detecting and matching the two corner points of the object's upper left corner and lower right corner and then detects the object using edge information. Zhou et al. 4 proposed CenterNet, which shows the object through the object's center point, returns some of the object's information at the center point, and then completes the object detection. When an anchor-free object detector detects objects, however, fuzzy samples that cause bounding boxes to overlap will definitely arise, lowering regression accuracy. Tian et al. 2 proposed fully convolutional one-stage object detector (FCOS), which reduces the number of fuzzy samples by utilizing multi-level prediction to limit the range of bounding box regression at each level.
When the slender object and horizontal ground are at a certain angle, the horizontal box cannot accurately indicate the position information of the object. [5][6][7][8] Therefore, the rotating box may be better at revealing the slender object's position information. Zhang et al. 9 proposed R2PN, which is a rotating regional proposal network to generate multi-directional candidate proposals containing the object angle information, hence improving regional proposal quality. In addition, due to the slender object's unique shape with a large aspect ratio, which causes a considerable variation in the decay rates of its long and short sides during regression, positioning of the slender object during regression becomes inaccurate. As a result, the slender object's regression accuracy is reduced. To this end, Yang et al. 10 proposed the derivable skew intersection-over-union (IoU) loss, which improves the accuracy of regression of slender objects to some extent.
In response to the above problems, an anchor-free detector with aspect ratio priority for slender rotation object (SRPAR) is proposed in this paper. First, to solve the problem that the long and short sides of a slender object do not decay at the same rate during regression, the aspect ratio priority factor, which improves the slender rotating object's regression accuracy, was designed to guide the regression of the slender rotating object. Second, to reduce the impact of fuzzy samples on object detection accuracy, the feature pyramid network (FPN) 11 multi-level prediction was employed to limit the range of bounding boxes and corresponding angle regressions at each level, thus significantly reducing the number of fuzzy samples. Finally, the self-made COCO-Stick dataset was created to better evaluate SRPAR's detection performance, and a series of experiments were conducted on both the dataset of object detection in aerial images (DOTA) 12 dataset and self-made COCO-Stick dataset.
In summary, the work of this article mainly has the following contributions: 1. The aspect ratio priority factor is designed. It is utilized to guide the regression of slender rotating objects and improve regression accuracy. 2. The multi-level prediction of FPN is employed. Through multi-level FPN prediction, the range of bounding boxes and corresponding angular regressions at each level are limited, and the number of fuzzy samples is greatly reduced.
3. The self-made COCO-Stick dataset is created. SRPAR's detection performance of slender rotating objects having aspect ratios of at least 3:1 is better evaluated.
The remainder of the paper is structured as follows. First, related work is reviewed in Sec. 2. Second, in Sec. 3, the proposed SRPAR is introduced in detail. Then, in Sec. 4, related experiments are designed to evaluate SRPAR's detection performance. Finally, in Sec. 5, the conclusions of this research are given.

Related work
The majority of the objects detected by the proposed SRPAR are stick-like objects with large aspect ratios. Due to the unique shapes of the stick-like objects with large aspect ratios, as well as the fact that the stick-like objects and horizontal ground are at certain angles, the detection of stick-like objects becomes even more challenge. Anchor-free object detectors may be better at detecting objects with large aspect ratios. In the remainder of this section, relevant journal articles are consulted about anchor-free object detection and rotating objects to further refine the proposed SRPAR by learning from the relevant algorithms, and thus SRPAR's detection accuracy is improved.

Anchor-Free Object Detection
The anchor-free object detection algorithm minimizes the amount of calculation by avoiding complex calculations on anchor boxes. The anchor-free algorithm primarily predicts the object's corner, extreme, or center points and then processes these key points to generate the object's detection box.
The anchor-free detection algorithm, based on corner or extreme points prediction, generates the detection box of an object by processing the bounding points. By detecting and matching these two corner points in the top left and bottom right corners of the object, the CornerNet 3 algorithm, which only uses edge information to detect objects, determines an anchor box. However, the object's inside information is typically more recognized, making erroneous detection more likely. Zhou et al. 4 proposed CenterNet, which adds the detection of the center point of the object to the CornerNet 3 algorithm to overcome this problem, which improves the detection accuracy. Zhou et al. 13 proposed ExtremeNet, which locates the anchor box by locating the four extreme points at the top, bottom, left, and right edges of the object, as opposed to the previous two algorithms that determine the anchor box by corner points. Yang et al. 14 proposed RepPoints, which is based on the detection of key points. Unlike CornerNet 3 and ExtremeNet, 13 RepPoints 14 does not have a matching problem between key points. Instead, it indicates the object with a novel point set; the object detection performance is promising.
The anchor-free algorithm, based on center prediction, locates the detection box by the distance from the center point to the four boundaries. FCOS 2 is an object detection algorithm based on pixel-level prediction that locates the anchor box by predicting the distance from the center point to the object boundary. Furthermore, none of the positions far from the object center produced high-quality prediction boxes; thus center-ness was added to suppress these low-quality prediction boxes, hence improving detection accuracy.

Rotating Object Detection
Because of the unique shape of slender objects' large aspect ratios, when rotations occur, significant variations in angle are caused, making detection of slender objects with angular rotations more challenging. R2PN 9 is a rotating regional proposal network that generates multi-directional candidate proposals containing object angle information, hence improving regional proposal quality. However, if the slender object's lean angle is relatively small, the slender object can easily be mistakenly detected as being horizontal. Xu et al. 15 proposed a gliding vertex, which proposes a tilt factor that efficiently solves the problem of near-horizontal object detection.
The majority of boundary discontinuities in rotating object detectors based on regression are caused by angular periodicity. 16,17 To this purpose, Yang et al. 16 proposed the IOU smooth L1 loss, which solves the angular periodicity-induced boundary discontinuity problem. When the arrangement of slender rotating objects is denser, the object instances are highly overlapped during the detection process, resulting in unsatisfactory detection. [18][19][20][21] To overcome this problem, Yang et al. 10 proposed a single-stage end-to-end detector with a combination of rotating and horizontal anchor boxes to adapt to a dense arrangement of rotating objects.
In summary, a new object detection model (SRPAR) is proposed in this paper. To address the problem that the long and short sides of a slender object do not decay at the same rate during regression, the aspect ratio priority factor is designed to guide the regression of slender objects, thus improving the regression accuracy. Furthermore, to reduce the impact of fuzzy samples on object detection, FPN 11 multi-level prediction is employed to limit the range of bounding boxes and corresponding angular regressions at each level, thus minimizing the number of fuzzy samples.

Our Approach
As shown in Fig. 2, the network structure is divided into four parts: the backbone network, the FPN, the feature selection module, and the multitasking subnetwork. Because the dataset to be trained is not complicated and a balance between detection accuracy and speed is needed, ResNet-101 22 was chosen as the backbone network. After the fuzzy samples are reduced by FPN 11 multi-level prediction, the feature selection module inputs the feature-selected multi-scale feature map into the multitasking subnetwork. Regression and classification subnetworks make up the multitasking subnetwork. The categories are predicted through the classification subnet. The regression subnet is responsible for predicting the distances t, b, l, and r from the prediction point to the top, bottom, left, and right edges of the bounding box, respectively, as well as the corresponding rotation angle θ 0 of the bounding box.

FPN Multi-Level Prediction
When an anchor-free object detector detects objects, however, fuzzy samples that cause bounding box overlap undoubtedly arises. A location will fall into multiple bounding boxes at the same time, and the network will have difficulty in deciding which bounding box to return. For the detection of horizontal objects, the range of the bounding box regression at each level is limited using FPN 11 multi-level prediction, and thus the number of fuzzy samples is reduced. However, in this paper, the object to be detected has a rotation angle, and therefore, only limiting the range of the bounding box regression without considering the angle is inappropriate. Therefore, through the multi-level prediction of FPN, 11 the range of bounding boxes and corresponding angular regressions at each level are limited, and the number of fuzzy samples is significantly reduced. The effectiveness of this approach is also confirmed in the ablation study in Sec. 4.1.
The regression subnetwork is in charge of regressing the five variables t; b; l; r, and θ 0 . t, b, l, and r are the distances between the prediction point and the bounding box's top, lower, left, and right edges, respectively, and θ 0 is the rotated object's prediction angle. The mapping relationship between the five variables t; b; l; r; θ 0 and the ground-truth box denotes a prediction box, when the prediction box overlaps with the object bounding box. The corresponding bounding box is shown in Fig. 3. The coordinates of the upper left and bottom right corners of the bounding box are Bðx B ; y B Þ and Aðx A ; y A Þ, respectively. The coordinates of the two points are denoted by the object' five-parameter label coordinates of ðx c ; y c ; w; h; θÞ. The bounding box's rotation angle is θ, which denotes the acute angle of the bounding box to the x axis. According to the long-edge definition method of angles, 17 θ ∈ ½− π 2 ; π 2 Þ. When any prediction point pðx; yÞ falls in the object bounding box, the prediction point is considered to be a positive sample, and the regression subnetwork is in charge of regressing the five variables t; b; l; r; θ 0 . The four variables t, b, l, and r of regression are denoted by the three coordinates Bðx B ; y B Þ, Aðx A ; y A Þ, and pðx; yÞ, as shown in Eq. (1). E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 3 0 3 The range of bounding box regression at each level is restricted by FPN 11 multi-level prediction. Specifically, five levels of feature maps defined as fP3; P4; P5; P6; P7g are employed. P3, P4, and P5 are produced by the backbone convolutional neural network (CNN) feature maps. C3, C4, and C5 are followed by a 1 × 1 convolutional layer, as shown in Fig. 4. P6 and P7 are obtained by applying one convolutional layer with the stride being 2 on P5 and P6, respectively. The size of the convolution kernel is 3 × 3. H × W represents the height and width of feature maps, and the input images in the model are all 600 × 400 in size.
For the fuzzy sample problem, the specific approach chosen is to compute the regression variables t, b, l, and r, which correspond to the positions of the predicted points on the feature level. Limiting only the regression range of the bounding box without the corresponding angle is not suitable. If the position at feature level j (j refers to any level) satisfies jθ 0 j ≤ π 2 and m j−1 ≤ maxðl; t; r; bÞ ≤ m j , the position at the feature level is set as a positive sample, and a bounding box is regressed. Otherwise, no regression is required. In m j−1 ≤ maxðl; t; r; bÞ ≤ m j , m j is the maximum distance to be regressed for a pixel at feature level j. In this study, m 2 , m 3 , m 4 , m 5 , m 6 , and m 7 are set as 0, 38, 75, 150, 300, and ∞. Because the input image size in the model is 600 × 400, the specific values of 38, 75, 150, and 300 are taken. And the maximum pixel distance that the points on the image need to be returned to is 600. The maximum pixel distance of 600 is divided by the number of steps in the convolution layer of 2. The value of 300 is taken as the maximum distance of m6, which is then divided by the number of steps in the convolution layer of 2, and the maximum distance of

Aspect Ratio Priority
Recently, anchor-free object detectors have attracted considerable attention because of being easy to operate and having superior performance. The anchor-free object detection algorithm minimizes the amount of calculation by avoiding complex calculations on anchor boxes. Furthermore, anchor-free object detectors can already achieve detection accuracy that is similar or even superior to traditional object detectors based on anchor boxes, under the equivalent testing conditions. 10 Slender objects with large aspect ratios may be better indicated by anchor-free object detectors, which would improve slender object detection accuracy.
A slender object with a large aspect ratio decays faster on the long side and slower on the short side during regression, which results in inconsistent regression decay rates between the long and short sides. The larger the aspect ratio of the slender object is, the more obvious this difference in decay rates is, leading to inaccurate positioning of the slender object during regression, which ultimately reduces the regression accuracy of the slender object. This study proposes a new solution to the problem. Our study designs the aspect ratio priority factor to solve the problem of inconsistent decay rates between the long and short sides of slender objects during regression. The aspect ratio priority factor is represented by Eq. (2) E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 1 0 8 In Eq. (2), τ is the inverse of the aspect ratio of the object; it is represented by Eq. (3) E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 7 2 3 τ ¼ minðw; hÞ∕ maxðw; hÞ: In Eq. (3), w and h are the width and height of the ground-truth box, respectively. In the SRPAR's loss function, the aspect ratio priority factor, which is designed to be placed in the regression loss as a penalty weight during long-edge regression [this is reflected in Eq. (5)], is used to guide the regression of the slender object. The aspect ratio priority factor restrains the regression rate of the long side of the object during regression, thus reducing the discrepancy between the decay rates of the long and short sides of the object and ultimately improving the regression accuracy of the slender object.

Loss Function
Slender objects detected by the proposed SRPAR involve classification and localization in the prediction process. Thus, the loss function can be divided into two components: classification loss and regression loss.
The L cls is the classification loss, which is defined as shown in Eq. (4).
The classification loss L cls is the focal loss. 21 In Eq. (4), pðx; yÞ indicates the prediction point; β is the indicator function, which is 1 if the point ðx; yÞ falls into the corresponding real object box and 0 otherwise; and ξ, which is larger than 0, is the adjustable factor. When the object is being located, if the five variables t; b; l; r; θ 0 are regressed together, due to the angular periodicity, there is a tendency for angular loss discontinuities to occur at the boundary during regression. Therefore, t; b; l; r and the predicted angle θ 0 are regressed separately.
For ResNet networks, L2 loss converges much faster than L1 loss, 15 and hence, L2 loss is chosen as the loss function for t, b, l, and r during regression. In addition, an aspect ratio priority factor is designed as a penalty weight for long-edge regression. It helps to improve the accuracy of the regression for slender objects, as confirmed in the ablation study in Sec. 4.1.
The L reg is chosen as the loss of the four quantities t, b, l, and r during regression, which is defined as shown in Eq. (5) E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 3 2 5 L reg ððt; b; l; rÞ; ðw; hÞÞ ¼ In Eq. (5), Δ priority is the defined aspect ratio priority factor; t, b, l, and r are the distances between the prediction point and the bounding box's top, lower, left, and right edges, respectively; and w and h are the width and height of the ground-truth box, respectively. The L mr is chosen as the loss for the angular regression, which is defined in Eq. (6) E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 2 2 9 L mr ðθ 0 ; θÞ ¼ min The angular regression loss L mr is l mr loss. 22 In Eq.
In Eq. (7), N pos denotes the number of positive samples. A positive sample occurs when the point pðx; yÞ falls into the corresponding ground-truth box. The hyper-parameters λ 1 , λ 2 , and λ 3 control the trade-off and are all set to 1 by default.

Experiments
All experiments were conducted in the Ubuntu 18.04 operating system, with the Pytorch deep-learning framework and the Quadro p4000 graphics processing unit. This experiment was carried out on the DOTA dataset and the self-made COCO-Stick dataset, and ResNet-101 22 was employed as the training network. On the DOTA dataset, the network utilized in the experiments was trained with a method for stochastic optimization (Adam) for 6 k iterations, with the initial learning rate being 1.25e − 3 and a mini-batch of 10 images. The learning rate was reduced by a factor of 10 at iteration 3.6 and 4.8 k, respectively. The weight decay was set as 1e − 4. On the self-made COCO-Stick dataset, the network was trained with Adam for 3 k iterations with the initial learning rate being 1.25e − 4. The learning rate was reduced by a factor of 10 at iteration 1.8 and 2.4 k, respectively, and the other training parameters were kept the same as before.

Ablation Study
To address the problem of fuzzy samples arising from overlapping object prediction boxes, FPN 11 multi-level prediction was deployed to limit the range of bounding boxes and corresponding angle regressions at each level. To verify the effectiveness of our method, this experiment compared the detection results of the original network and the network after adopting FPN 11 multi-level prediction on the DOTA 12 dataset, which contains 15 categories. The model evaluation indicator is the fuzzy sample rate, which was quoted from Tian et al.'s work. 2 In Table 1, Dataset denotes the dataset employed, and With/FPN denotes FPN 11 multi-level prediction. The experimental results are shown in Table 1. When FPN 11 multi-level prediction was adopted in the original network, the fuzzy sample ratio for all objects in the DOTA 12 dataset decreased from 13% to 3.3%. The experimental results show that the number of fuzzy samples is significantly reduced, after FPN 11 multilevel prediction was utilized to limit the range of bounding boxes and corresponding angle regressions at each level.
To solve the problem of the long and short edges of slender objects not decaying at the same rate during regression, an aspect ratio priority factor was designed to guide the regression of slender rotating objects. To verify the effectiveness of the introduced aspect ratio factor, this experiment compared the detection results of the original network and the network after the addition of the aspect ratio factor on the DOTA 12 dataset. The evaluation metrics of the model are the mean average precision (mAP) and recall.
The experimental results are shown in Table 2. Dataset denotes the dataset used, and With/ priority denotes the aspect ratio priority factor introduced into the network. As can be seen from Table 2, when the aspect ratio factor was introduced into the original network, the mAP and recall of all objects in the DOTA 12 dataset increased by 3.1% and 14%, respectively. The mAP and recall values of the objects in the network increased, after the introduction of the aspect ratio factor. The experimental results show that the introduction of an aspect ratio factor in the network can improve the regression accuracy of slender objects.

DOTA
The detection objects of the proposed model are slender rotating objects with aspect ratios of no less than 3:1. Ships with large aspect ratios met the detection needs in the DOTA dataset, and hence, the ships were detected in the DOTA dataset. Ship objects were extracted from 1441 raw images of the training set in the DOTA 12 dataset and cropped to pieces of size 600 × 400, yielding 12,644 images containing only ships. The txt label file corresponding to these images was then transformed into the xml label file corresponding to the five-parameter method. Furthermore, these ships have different aspect ratios of 5:1 to 10:1, 3:1 to 5:1, and 1:1 to 3:1, and the corresponding number of the ships in each range accounts for 2.27%, 24.36%, and 73.37% of the total number of ships in the DOTA 12 training set, respectively. Experimental results on the DOTA 12 dataset are shown in Table 3. In this paper, the model evaluation indicator is AP, the value of which is the area between the curves made with accuracy and recall as variables and the coordinate axes. Moreover, AP sr refers to the AP with aspect ratios between 1:1 and 3:1. AP mr and AP lr refer to the AP with aspect ratios that are within 3:1 to 5:1 and 5:1 to 10:1, respectively. SCRDet, 16 RSDet, 22 RRPN, 23 and R 3 Det 10 were selected as comparative algorithms. The comparison results of different algorithms are shown in Table 3.
The data in Table 3 show that the AP of the proposed SRPAR outperforms the other four detection algorithms for typical slender objects with large aspect ratios, such as ships. Furthermore, the AP sr , AP mr , and AP lr were also tested for ships with different aspect ratios on the four algorithmic models of SCRDet, 16 RSDet, 22 RRPN, 23 and R 3 Det, 10 and the results are shown in Table 3. The AP sr of the proposed SRPAR for ship detection is the second highest. In addition, the proposed SRPAR has the highest AP mr and AP lr for ship detection of all of the compared algorithms. The AP lr of ship detection for the proposed SRPAR and the other four algorithms are low due to the large variability in the decay rate of the long and short sides of ships with aspect ratios ranging from 5:1 to 10:1, which leads to inaccurate positioning of the ship objects in the regression and ultimately reduces the regression accuracy of the ship objects. These comparison results show that the proposed SRPAR effectively detects slender rotating objects with large aspect ratios in remote sensing images.
Compared with the number of ships with other aspect ratios, the number of ships in the DOTA 12 dataset with an aspect ratio of 1:1 to 3:1 is higher. For objects with different aspect ratios, there is a sample imbalance problem, which should be studied further in the future.
Ship detection results of proposed SRPAR on the DOTA 12 are shown in Fig. 5. When the aspect ratio of the ship object is large, the trained model can still detect it, reflecting that the method proposed is effective in detecting slender rotating objects.

COCO-Stick
The model is designed to meet the detection needs of specific types of objects, such as sticks, which are slender objects with an aspect ratio of at least 3:1. Although experimental results on the DOTA 12 dataset have shown that the proposed model is good at detecting slender rotating objects, the model had not yet done any detection of stick objects. Therefore, a stick dataset that meets requirements of SRPAR's detection needed to be created; in the dataset, the aspect ratios of objects should be mainly between 3:1 to 5:1 and 5:1 to 10:1 and the aspect ratios of objects between 1:1 and 3:1 should only account for a very small part of the dataset. At present, there is a lack of such datasets at home and abroad. For this purpose, 500 rotating stick images with a resolution of 800 × 800 to 1200 × 1200 were collected and cropped to pieces of size 600 × 400, yielding 2859 images. The five-parameter method was utilized to label them, resulting in a new dataset called Stick. The Stick dataset has several characteristics. The objects have large aspect ratios. Objects in the Stick dataset have different aspect ratios of 5:1 to 10:1, 3:1 to 5:1, and 1:1 to 3:1, as shown in Table 4.
The baseball bat subset of the COCO 1 dataset contains some baseball bats that meet the detection requirements of the proposed model for slender rotating stick objects. Baseball bat objects were extracted from raw images in the COCO 1 dataset and cropped to pieces of size 600 × 400, yielding 2756 images, which were subsequently labeled using the five-parameter method. The Stick dataset was added to the baseball bat subset to form a new self-made COCO-Stick dataset. The self-made COCO-Stick dataset has 5615 images of stick-like objects.
The self-made COCO-Stick dataset has several characteristics as well. The objects have large aspect ratios. Furthermore, objects in the self-made COCO-Stick dataset have different aspect ratios of 5:1 to 10:1, 3:1 to 5:1, and 1:1 to 3:1, as shown in Table 5. Objects with an aspect ratio of at least 3:1 account for 79.69% in the dataset, the majority of the dataset, and satisfy the SRPAR's detection requirements.  SCRDet, 16 RSDet, 22 RRPN, 23 and R 3 Det 10 were used as comparison algorithms for the proposed model, and the results of the different algorithms on the self-made COCO-Stick dataset are shown in Table 6. The AP, AP sr , and AP mr values of our model for stick detection are higher than the corresponding values of the other four algorithms. The AP lr value for stick detection is the highest among these comparative algorithms, and this value best reflects the performance of  Note: Bold values also represent the optimal values of the five algorithms compared in the table for the detection of targets with different aspect ratios. Fig. 6 Detection results on the self-made COCO-Stick dataset.
the model in detecting objects with large aspect ratios. The experimental results show that the proposed model has some advantages on the self-made COCO-Stick dataset. Detection results of the proposed SRPAR on the self-made COCO-Stick dataset are shown in Fig. 6. The top two images in Fig. 6 are detection images from the baseball bat set, and the bottom two images in Fig. 6 are detection images from the Stick dataset. When detecting sticklike objects having large aspect ratios, SRPAR detects them well, indicating that proposed the SRPAR performs better on the self-made COCO-Stick dataset.

Conclusion
When it comes to detecting specific types of slender objects, the anchor-free object detector faces some problems including large aspect ratio, overlapping of bounding boxes, and the lack of appropriate datasets. A new object detection method is proposed to address these problems. First, the aspect ratio priority factor was designed to solve the problem of inconsistent decay rates between the long and short sides of slender objects during regression. Second, FPN multilevel prediction was utilized to limit the range of bounding boxes and corresponding angle regressions at each level, the number of fuzzy samples was significantly reduced. Finally, the self-made COCO-Stick dataset was created to further evaluate the SRPAR's performance in detecting slender rotating objects. To more intuitively verify SRPAR's performance in detecting slender rotating objects with varying aspect ratios, these objects were classified into three categories, each of which has a different aspect ratio of 1:1 to 3:1, 3:1 to 5:1, and 5:1 to 10:1. Experiments were conducted with the DOTA and self-made COCO-Stick datasets. Experimental results on the DOTA dataset reveal that SRPAR outperforms several advanced object detection algorithms currently available in terms of detection accuracy. The performance of the method was then further validated on the self-made COCO-Stick dataset. However, imbalance in aspect ratios exists in the object training samples in the DOTA dataset employed by SRPAR. Further studies need to be conducted to improve SRPAR's performance in detecting slender rotating objects.