[Avg. reading time: 6 minutes]

Feature Engineering

The process of transforming raw data into more informative inputs (features) for ML models.

Goes beyond encoding: you can create new features/metrics (like derived columns in the DB world) that pure encoding does not offer.

The goal of FE is to improve model accuracy, interpretability, and generalization.

Example (Laptop Sales):

Purchase Date = 2025-09-02

Derived Features:

  • Month = 09
  • DayOfWeek = Tuesday
  • IsHolidaySeason = No
  • IsWeekend = No
  • IsLeapYear= No
  • Quarter = Q3

Encoding (One-Hot, Label, Target) = only turns categories into numbers.

But real-world data often hides useful patterns in dates, interactions, domain knowledge, or semantics.

IDProductPurchase DatePricePurchasedAgain
1Laptop2023-12-0112001
2Laptop2024-07-1511000
3Phone2024-05-208001
4Tablet2024-08-056001
  • Encoding only handles Product → One-Hot or Target.

Feature Engineering adds new insights:

  • From Purchase Date: extract Month, DayOfWeek, IsHolidaySeason.
  • From Price: create Discounted? (if < avg product price).
  • Combine features: Price / AvgCategoryPrice.

Basic Feature Engineering

Improve signals/patterns without domain-specific knowledge.

Scaling/Normalization: Price → (Price – mean) / std

Date/Time Features: Purchase Date → Month=12, DayOfWeek=Friday

Polynomial/Interaction: Price × Tier

Pros:

  • Easy to implement.
  • Immediately boosts many models (especially linear/Neural Networks).

Cons:

  • Risk of adding noise if done blindly.
  • Limited unless combined with domain insights.

Domain-Specific Feature Engineering

Apply business/field knowledge.

Examples:

Finance: Debt-to-Income Ratio, Credit Utilization %

Healthcare: BMI = Weight / Height², risk score categories

IoT: Rolling averages, peak detection in sensor data.

Pros:

  • Captures real-world meaning → big performance gains.
  • Makes models explainable to stakeholders.

Cons:

  • Requires domain expertise.
  • Not always transferable between datasets.

#feature_engineering #domain_specificVer 0.3.6

Last change: 2025-12-02