[Avg. reading time: 10 minutes]
Vectors
A vector is just an ordered list of numbers that represents a data point so models can do math on it.
Think “row -> numbers” for tabular data, or “text/image -> numbers” after a transformation.
Example:
Price = 1200, Weight = 2kg, Warranty = 24 months → Vector = [1200, 2, 24]
Types of Vectors
Tabular Feature Vector
Concatenate numeric columns (and encoded categoricals) into a single vector.
ML engineer/data scientist during data prep/FE (training) and the same code at inference.
Example: [Price, Weight, Warranty] → [1200, 2, 24].
Sparse Vectors
High-dimensional vectors with many zeros (e.g., One-Hot, Bag-of-Words, TF-IDF).
Encoding/featurization function in your pipeline.
Example
Products = {Laptop, Phone, Pen}
Laptop → [1, 0, 0]
Phone → [0, 1, 0]
Pen → [0, 0, 1]
Dense Vectors (compact, mostly non-zeros)
Lower-dimensional, compact numeric representation
Created by algorithms (scalers/PCA) or models (embeddings) in your pipeline.
Lower-dimensional, compact, mostly non-zeros → dense.
Example: Not actual values
Laptop → [0.65, -0.12, 0.48]
Phone → [0.60, -0.15, 0.52]
Pen → [0.10, 0.85, -0.40]
Laptop and Phone vectors are close together.
Model-Derived Feature Vectors
Dense vectors specifically generated by models like CNN/Transformer as a vector. Mainly used with Computer Vision. Image classification, object detection, face recognition, voice processing.
Models generate them during feature extraction (training & inference).
Example: BERT sentence vector, ResNet image features.
| Vector Type | Who designs it? | Who computes it? | When it’s computed | Example |
|---|---|---|---|---|
| Tabular feature vector | ML Eng/DS (choose columns) | Pipeline code | Train & Inference | [Price, Weight, Warranty] |
| Sparse (One-Hot/TF-IDF) | ML Eng/DS (choose encoder) | Encoder in pipeline | Train (fit) & Inference (transform) | One-Hot Product |
| Dense (scaled/PCA) | ML Eng/DS (choose scaler/PCA) | Scaler/PCA in pipeline | Train (fit) & Inference (transform) | StandardScaled price, PCA(100) |
| Model features / Embeddings | ML Eng/DS (choose model) | Model (pretrained or trained) | Train & Inference | BERT/ResNet/categorical embedding |
MLOps ensures the same steps run at inference to avoid train/serve skew.
Example of Dense Vector
python -m venv .densevector
source .densevector/bin/activate
pip install sentence-transformers
from sentence_transformers import SentenceTransformer
# Load a pre-trained model (MiniLM is small & fast)
model = SentenceTransformer('all-MiniLM-L6-v2')
text = "Laptop"
# Convert text into dense vector
vector = model.encode(text)
print("Dense Vector Shape:", text, vector.shape)
print("Dense Vector (first 10 values):", vector[:10])
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Words
texts = ["Laptop", "Computer", "Pencil"]
# Encode all
vectors = model.encode(texts)
# Convert to numpy array
vectors = np.array(vectors)
# Cosine similarity matrix
sim_matrix = cosine_similarity(vectors)
# Display similarity scores
for i in range(len(texts)):
for j in range(i+1, len(texts)):
print(f"Similarity({texts[i]} vs {texts[j]}): {sim_matrix[i][j]:.4f}")