Repository logo
  • English
  • Italiano
  • Log In
    Have you forgotten your password?
Repository logo
Repository logo
  • Archive
  • Series/Journals
  • EUT
  • Events
  • Statistics
  • English
  • Italiano
  • Log In
    Have you forgotten your password?
  1. Home
  2. EUT Edizioni Università di Trieste
  3. Collane
  4. DEAMS Research Paper Series
  5. DEAMS Research Paper Series 2024, 4
  6. Cluster based oversampling for imbalanced classification
 
  • Details
  • Metrics
Options

Cluster based oversampling for imbalanced classification

Di Credico, Gioia
•
TORELLI, Nicola
2024
Loading...
Thumbnail Image
ISBN
978-88-5511-534-6
https://www.openstarts.units.it/handle/10077/36208
  • Working Paper

Abstract
Oversampling is a widespread remedy used when data imbalance in classification problems occurs. Some oversampling techniques amount to generating new cases in the minority class, which are similar to the observed ones. ROSE (Random Over Sampling Examples) is an algorithm for generating new data, both in minority and majority classes, using kernel density estimation and bootstrap resampling. In practical application of ROSE, fine tuning of smoothing parameter in kernel density estimate is advisable, especially for the rare class. This is particularly true when well separated subgroups characterize the rare class. We propose a new strategy, ROSEclust, which pairs density-based clustering methods with ROSE to deal with a strongly skewed distribution of the classes and grouping within the rare class. Evidence from simulation studies and real data applications shows that the new approach solves some issues related to ROSE in dealing with complex class data structures. The synthetic data distribution is closer to the original one, and predictive performances of classification methods to synthetic data are not compromised. The entire procedure is designed to be free from parameter tuning. Therefore, the ROSEclust strategy expands application of ROSE and automates the balancing data step, leaving more room for the modelling step.
Series
DEAMS Research Paper Series 
Subjects
  • Density-based cluster...

  • tuning parameters

  • resampling

  • ROSE

  • SMOTE

Source
Gioia Di Credico and Nicola Torelli, "Cluster based oversampling for imbalanced classification", Trieste, EUT Edizioni Università di Trieste, 2024
Languages
en
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Licence
http://creativecommons.org/licenses/by-nc-nd/4.0/
File(s)
Loading...
Thumbnail Image
Download
Name

DiCredicoROSEclust-Wpaper.pdf

Format

Adobe PDF

Size

10.03 MB

Indexed by

 Info

Open Access Policy

Share/Save

 Contacts

EUT Edizioni Università di Trieste

OpenstarTs

 Link

Wiki OpenAcces

Archivio Ricerca ArTS

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback