Interpreting Model Performance
In a financial dataset with a 0.5% fraud rate, which metric best assesses a classifier’s ability to detect fraud, and why is 'accuracy' misleading in this scenario?
- A. Precision, because it measures the percentage of predicted fraud cases that are truly fraudulent
- B. Recall, because it measures the percentage of actual frauds correctly detected
- C. F1-score, as it balances both precision and recall in the evaluation
- D. Accuracy, because it shows the overall correct predictions including both fraud and non-fraud
- E. Area Under the ROC Curve (AUC-ROC), which is not impacted by class imbalance
Handling Imbalanced Classes
Suppose a dataset for fraud detection has a 1:200 ratio of fraudulent to legitimate transactions. Which technique most effectively addresses this class imbalance during training?
- A. Downsampling the majority class to match the minority class
- B. Removing outlier points entirely
- C. Randomly duplicating all data points for the minority class without modification
- D. Shuffling the dataset before training
- E. Using the original unbalanced dataset with no changes
Feature Engineering in Fraud Detection
When building a machine learning model to detect fraudulent transactions, which of the following feature engineering steps can help capture sudden changes in a user’s behavior?
- A. Calculating the moving average of transaction amounts per day
- B. Encoding user IDs as raw numerical values
- C. Removing all categorical variables related to geography
- D. Replacing timestamps with random numbers
- E. Scaling all features to the range [0, 1] without investigating distributions
Unsupervised Techniques for Unknown Frauds
If labels for fraudulent transactions are unavailable, which machine learning approach is best suited to detecting potential new types of fraud?
- A. Logistic regression on labeled data
- B. Decision trees with grid search
- C. Unsupervised anomaly detection using clustering algorithms
- D. Oversampling using SMOTH
- E. Stacking multiple supervised classifiers
Concept Drift and Model Updating
In fraud detection systems, what is the primary risk if a model trained on last year’s data is not periodically updated, especially when fraud patterns evolve over time?
- A. The model may start predicting only legitimate transactions as fraud
- B. The model could suffer from concept drift, leading to reduced effectiveness
- C. The model’s computational cost will increase automatically
- D. The model’s recall will improve continually
- E. The model will eventually require more storage space than the dataset