散文網(wǎng) » 科技 »學(xué)習(xí) » CVPR'23 最新 99 篇論文分方向整理｜涵蓋神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)、醫(yī)學(xué)影像、圖像去霧等方向

CVPR'23 最新 99 篇論文分方向整理｜涵蓋神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)、醫(yī)學(xué)影像、圖像去霧等方向

2023-04-24 10:49 作者:極市平臺(tái) 0人讀過(guò) | 我要投稿

CVPR2023已經(jīng)放榜，今年有2360篇，接收率為25.78%。在CVPR2023正式會(huì)議召開(kāi)前，為了讓大家更快地獲取和學(xué)習(xí)到計(jì)算機(jī)視覺(jué)前沿技術(shù)，極市對(duì)CVPR023 最新論文進(jìn)行追蹤，包括分研究方向的論文、代碼匯總以及論文技術(shù)直播分享。

CVPR 2023 論文分方向整理目前在極市社區(qū)持續(xù)更新中，已累計(jì)更新了919篇，項(xiàng)目地址：https://www.cvmart.net/community/detail/7422

以下是最近更新的 CVPR 2023 論文，涵蓋神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)、醫(yī)學(xué)影像、ReId、圖像去霧、異常檢測(cè)等方向。

下載地址：https://www.cvmart.net/community/detail/7520

2D目標(biāo)檢測(cè)(2D Object Detection)

[1]Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection with Single Point Supervision
paper：https://arxiv.org/abs/2304.01484
code：https://github.com/xinyiying/lesps

[2]Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains
paper：https://arxiv.org/abs/2304.02950

[3]Continual Detection Transformer for Incremental Object Detection
paper：https://arxiv.org/abs/2304.03110

[4]DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
paper：https://arxiv.org/abs/2304.04514

[5]Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection
paper：https://arxiv.org/abs/2304.05098

3D目標(biāo)檢測(cè)(3D object detection)

[1]Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection
paper：https://arxiv.org/abs/2304.01464
code：https://github.com/azhuantou/hssda

[2]Curricular Object Manipulation in LiDAR-based Object Detection
paper：https://arxiv.org/abs/2304.04248
code：https://github.com/zzy816/com

人物交互檢測(cè)(HOI Detection)

[1]Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream
paper：https://arxiv.org/abs/2304.03184

[2]Relational Context Learning for Human-Object Interaction Detection
paper：https://arxiv.org/abs/2304.04997

異常檢測(cè)(Anomaly Detection)

[1]Robust Outlier Rejection for 3D Registration with Variational Bayes
paper：https://arxiv.org/abs/2304.01514
code：https://github.com/jiang-hb/vbreg

[2]Video Event Restoration Based on Keyframes for Video Anomaly Detection
paper：https://arxiv.org/abs/2304.05112

語(yǔ)義分割(Semantic Segmentation)

[1]DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation
paper：https://arxiv.org/abs/2304.02222
code：https://github.com/fy-vision/diga

[2]Exploiting the Complementarity of 2D and 3D Networks to Address Domain-Shift in 3D Semantic Segmentation
paper：https://arxiv.org/abs/2304.02991
code：https://github.com/cvlab-unibo/mm2d3d

[3]Federated Incremental Semantic Segmentation
paper：https://arxiv.org/abs/2304.04620
code：https://github.com/jiahuadong/fiss

[4]Continual Semantic Segmentation with Automatic Memory Sample Selection
paper：https://arxiv.org/abs/2304.05015

深度估計(jì)(Depth Estimation)

[1]EGA-Depth: Efficient Guided Attention for Self-Supervised Multi-Camera Depth Estimation
paper：https://arxiv.org/abs/2304.03369

[2]DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium
paper：https://arxiv.org/abs/2304.03560
code：https://github.com/antabangun/dualrefine

人體解析/人體姿態(tài)估計(jì)(Human Parsing/Human Pose Estimation)

[1]A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
paper：https://arxiv.org/abs/2304.03635
code：https://github.com/changlongjianggit/a2j-transformer

[2]Monocular 3D Human Pose Estimation for Sports Broadcasts using Partial Sports Field Registration
paper：https://arxiv.org/abs/2304.04437
code：https://github.com/tobibaum/partialsportsfieldreg_3dhpe

[3]DeFeeNet: Consecutive 3D Human Motion Prediction with Deviation Feedback
paper：https://arxiv.org/abs/2304.04496

視頻處理(Video Processing)

[1]BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation
paper：https://arxiv.org/abs/2304.02225
code：https://github.com/junheum/biformer

超分辨率(Super Resolution)

[1]Better "CMOS" Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution
paper：https://arxiv.org/abs/2304.03542

圖像復(fù)原/圖像增強(qiáng)/圖像重建(Image Restoration/Image Reconstruction)

[1]Generative Diffusion Prior for Unified Image Restoration and Enhancement
paper：https://arxiv.org/abs/2304.01247

[2]CherryPicker: Semantic Skeletonization and Topological Reconstruction of Cherry Trees
paper：https://arxiv.org/abs/2304.04708

圖像去噪/去模糊/去雨去霧(Image Denoising)

[1]HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering
paper：https://arxiv.org/abs/2304.01686

[2]RIDCP: Revitalizing Real Image Dehazing via High-Quality
codebook Priors
paper：https://arxiv.org/abs/2304.03994
code：https://github.com/RQ-Wu/RIDCP_dehazing

人臉識(shí)別/檢測(cè)(Facial Recognition/Detection)

[1]Gradient Attention Balance Network: Mitigating Face Recognition Racial Bias via Gradient Attention
paper：https://arxiv.org/abs/2304.02284

[2]Micron-BERT: BERT-based Facial Micro-Expression Recognition
paper：https://arxiv.org/abs/2304.03195
code：https://github.com/uark-cviu/micron-bert

人臉生成/合成/重建/編輯(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[1]Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
paper：https://arxiv.org/abs/2304.01436

[2]StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer
paper：https://arxiv.org/abs/2304.02744

[3]GANHead: Towards Generative Animatable Neural Head Avatars
paper：https://arxiv.org/abs/2304.03950

目標(biāo)跟蹤(Object Tracking)

[1]Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
paper：https://arxiv.org/abs/2304.01893

[2]Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction
paper：https://arxiv.org/abs/2304.04298
code：https://github.com/viewsetting/unsupervised_sampling_promoting

圖像&視頻檢索/視頻理解(Image&Video Retrieval/Video Understanding)

[1]Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
paper：https://arxiv.org/abs/2304.05173

行人重識(shí)別/檢測(cè)(Re-Identification/Detection)

[1]PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-identification
paper：https://arxiv.org/abs/2304.01537

[2]Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
paper：https://arxiv.org/abs/2304.04205
code：https://github.com/jiawei151/sgiel_vireid

圖像/視頻字幕(Image/Video Caption)

[1]Cross-Domain Image Captioning with Discriminative Finetuning
paper：https://arxiv.org/abs/2304.01662
code：https://github.com/facebookresearch/EGG

[2]Model-Agnostic Gender Debiased Image Captioning
paper：https://arxiv.org/abs/2304.03693

醫(yī)學(xué)影像(Medical Imaging)

[1]Topology-Guided Multi-Class Cell Context Generation for Digital Pathology
paper：https://arxiv.org/abs/2304.02255

[2]Deep Prototypical-Parts Ease Morphological Kidney Stone Identification and are Competitively Robust to Photometric Perturbations
paper：https://arxiv.org/abs/2304.04077
code：https://github.com/danielf29/prototipical_parts

[3]Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis
paper：https://arxiv.org/abs/2304.04579
code：https://github.com/cristianopatricio/coherent-cbe-skin

圖像生成/圖像合成(Image Generation/Image Synthesis)

[1]Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
paper：https://arxiv.org/abs/2304.01816

[2]Few-shot Semantic Image Synthesis with Class Affinity Transfer
paper：https://arxiv.org/abs/2304.02321

點(diǎn)云(Point Cloud)

[1]MEnsA: Mix-up Ensemble Average for Unsupervised Multi Target Domain Adaptation on 3D Point Clouds
paper：https://arxiv.org/abs/2304.01554
code：https://github.com/sinashish/mensa_mtda

場(chǎng)景重建/視圖合成/新視角合成(Novel View Synthesis)

[1]Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
paper：https://arxiv.org/abs/2304.03526

[2]POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo
paper：https://arxiv.org/abs/2304.04038
code：https://github.com/lixiny/poem

[3]Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos
paper：https://arxiv.org/abs/2304.04452

[4]Neural Lens Modeling
paper：https://arxiv.org/abs/2304.04848

[5]One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
paper：https://arxiv.org/abs/2304.05097

[6]MonoHuman: Animatable Human Neural Field from Monocular Video
paper：https://arxiv.org/abs/2304.02001

[7]GINA-3D: Learning to Generate Implicit Neural Assets in the Wild
paper：https://arxiv.org/abs/2304.02163

[8]Neural Fields meet Explicit Geometric Representation for Inverse Rendering of Urban Scenes
paper：https://arxiv.org/abs/2304.03266

文本檢測(cè)/識(shí)別/理解(Text Detection/Recognition/Understanding)

[1]Towards Unified Scene Text Spotting based on Sequence Generation
paper：https://arxiv.org/abs/2304.03435

神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)設(shè)計(jì)(Neural Network Structure Design)

[1]SMPConv: Self-moving Point Representations for Continuous Convolution
paper：https://arxiv.org/abs/2304.02330
code：https://github.com/sangnekim/smpconv

CNN

[1]VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
paper：https://arxiv.org/abs/2304.01434
code：https://github.com/jaeill/CVPR23-VNE

Transformer

[1]METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
paper：https://arxiv.org/abs/2304.02211

[2]MethaneMapper: Spectral Absorption aware Hyperspectral Transformer for Methane Detection
paper：https://arxiv.org/abs/2304.02767

[3]Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
paper：https://arxiv.org/abs/2304.03282
code：https://github.com/dingmyu/dependencyvit

[4]Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
paper：https://arxiv.org/abs/2304.04237
code：https://github.com/leaplabthu/slide-transformer

圖神經(jīng)網(wǎng)絡(luò)(GNN)

[1]Adversarially Robust Neural Architecture Search for Graph Neural Networks
paper：https://arxiv.org/abs/2304.04168

歸一化/正則化(Batch Normalization)

[1]Delving into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
paper：https://arxiv.org/abs/2304.03937

模型訓(xùn)練/泛化(Model Training/Generalization)

[1]Re-thinking Model Inversion Attacks Against Deep Neural Networks
paper：https://arxiv.org/abs/2304.01669

[2]Improved Test-Time Adaptation for Domain Generalization
paper：https://arxiv.org/abs/2304.04494

長(zhǎng)尾分布(Long-Tailed Distribution)

[1]Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
paper：https://arxiv.org/abs/2304.01279
code：https://github.com/jinyan-06/shike

視覺(jué)表征學(xué)習(xí)(Visual Representation Learning)

[1]HNeRV: A Hybrid Neural Representation for Videos
paper：https://arxiv.org/abs/2304.02633
code：https://github.com/haochen-rye/hnerv

多模態(tài)學(xué)習(xí)(Multi-Modal Learning)

[1]Detecting and Grounding Multi-Modal Media Manipulation
paper：https://arxiv.org/abs/2304.02556
code：https://github.com/rshaojimmy/multimodal-deepfake

[2]Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce
paper：https://arxiv.org/abs/2304.02853

[3]Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
paper：https://arxiv.org/abs/2304.03307
code：https://github.com/talalwasim/vita-clip

視覺(jué)-語(yǔ)言（Vision-language）

[1]Learning to Name Classes for Vision and Language Models
paper：https://arxiv.org/abs/2304.01830

[2]VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision
paper：https://arxiv.org/abs/2304.03135
code：https://github.com/lmy98129/vlpd

[3]CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
paper：https://arxiv.org/abs/2304.04231
code：https://github.com/dk-liang/crowdclip

[4]Improving Vision-and-Language Navigation by Generating Future-View Image Semantics
paper：https://arxiv.org/abs/2304.04907

場(chǎng)景圖生成(Scene Graph Generation)

[1]Devil's on the Edges: Selective Quad Attention for Scene Graph Generation
paper：https://arxiv.org/abs/2304.03495

視覺(jué)推理/視覺(jué)問(wèn)答(Visual Reasoning/VQA)

[1]Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
paper：https://arxiv.org/abs/2304.03754

數(shù)據(jù)集(Dataset)

[1]Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
paper：https://arxiv.org/abs/2304.02828
code：https://github.com/noagarcia/phase

小樣本學(xué)習(xí)/零樣本學(xué)習(xí)(Few-shot Learning/Zero-shot Learning)

[1]Zero-shot Generative Model Adaptation via Image-specific Prompt Learning
paper：https://arxiv.org/abs/2304.03119

遷移學(xué)習(xí)/domain/自適應(yīng)(Transfer Learning/Domain Adaptation)

[1]DATE: Domain Adaptive Product Seeker for E-commerce
paper：https://arxiv.org/abs/2304.03669

[2]Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer
paper：https://arxiv.org/abs/2304.04461

持續(xù)學(xué)習(xí)(Continual Learning/Life-long Learning)

[1]Asynchronous Federated Continual Learning
paper：https://arxiv.org/abs/2304.03626
code：https://github.com/lttm/fedspace

[2]Exploring Data Geometry for Continual Learning
paper：https://arxiv.org/abs/2304.03931

[3]Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning
paper：https://arxiv.org/abs/2304.05288
code：https://github.com/wenjinw/par

[4]Online Distillation with Continual Learning for Cyclic Domain Shifts
paper：https://arxiv.org/abs/2304.01239

視覺(jué)定位/位姿估計(jì)(Visual Localization/Pose Estimation)

[1]OrienterNet: Visual Localization in 2D Public Maps with Neural Matching
paper：https://arxiv.org/abs/2304.02009

增量學(xué)習(xí)(Incremental Learning)

[1]On the Stability-Plasticity Dilemma of Class-Incremental Learning
paper：https://arxiv.org/abs/2304.01663

[2]PCR: Proxy-based Contrastive Replay for Online Class-Incremental Continual Learning
paper：https://arxiv.org/abs/2304.04408

強(qiáng)化學(xué)習(xí)(Reinforcement Learning)

[1]Reinforcement Learning-Based Black-Box Model Inversion Attacks
paper：https://arxiv.org/abs/2304.04625

元學(xué)習(xí)(Meta Learning)

[1]Meta-causal Learning for Single Domain Generalization
paper：https://arxiv.org/abs/2304.03709

[2]Meta Compositional Referring Expression Segmentation
paper：https://arxiv.org/abs/2304.04415

[3]Meta-Learning with a Geometry-Adaptive Preconditioner
paper：https://arxiv.org/abs/2304.01552
code：https://github.com/suhyun777/cvpr23-gap

半監(jiān)督學(xué)習(xí)/弱監(jiān)督學(xué)習(xí)/無(wú)監(jiān)督學(xué)習(xí)/自監(jiān)督學(xué)習(xí)(Self-supervised Learning/Semi-supervised Learning)

[1]Weakly supervised segmentation with point annotations for histopathology images via contrast-based variational model
paper：https://arxiv.org/abs/2304.03572

[2]Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
paper：https://arxiv.org/abs/2304.04175

[3]SOOD: Towards Semi-Supervised Oriented Object Detection
paper：https://arxiv.org/abs/2304.04515
code：https://github.com/hamperdredes/sood

[4]Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
paper：https://arxiv.org/abs/2304.01482
code：https://github.com/ucdvision/patchsearch

神經(jīng)網(wǎng)絡(luò)可解釋性(Neural Network Interpretability)

[1]Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning
paper：https://arxiv.org/abs/2304.04824

圖像計(jì)數(shù)(Image Counting)

[1]Density Map Distillation for Incremental Object Counting
paper：https://arxiv.org/abs/2304.05255

其他

[1]Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification
paper：https://arxiv.org/abs/2304.01804
code：https://github.com/youngwk/bridgegapexplanationpamc

[2]Knowledge Combination to Learn Rotated Detection Without Rotated Annotation
paper：https://arxiv.org/abs/2304.02199

[3]CloSET: Modeling Clothed Humans on Continuous Surface with Explicit Template Decomposition
paper：https://arxiv.org/abs/2304.03167

[4]DC2: Dual-Camera Defocus Control by Learning to Refocus
paper：https://arxiv.org/abs/2304.0328

標(biāo)簽：