[Avg. reading time: 10 minutes]

Vectors

A vector is just an ordered list of numbers that represents a data point so models can do math on it.

Think “row -> numbers” for tabular data, or “text/image -> numbers” after a transformation.

Example:

Price = 1200, Weight = 2kg, Warranty = 24 months → Vector = [1200, 2, 24]

Types of Vectors

Tabular Feature Vector

Concatenate numeric columns (and encoded categoricals) into a single vector.

ML engineer/data scientist during data prep/FE (training) and the same code at inference.

Example: [Price, Weight, Warranty] → [1200, 2, 24].

Sparse Vectors

High-dimensional vectors with many zeros (e.g., One-Hot, Bag-of-Words, TF-IDF).

Encoding/featurization function in your pipeline.

Example

Products = {Laptop, Phone, Pen}

Laptop → [1, 0, 0]
Phone → [0, 1, 0]
Pen → [0, 0, 1]

Dense Vectors (compact, mostly non-zeros)

Lower-dimensional, compact numeric representation

Created by algorithms (scalers/PCA) or models (embeddings) in your pipeline.

Lower-dimensional, compact, mostly non-zeros → dense.

Example: Not actual values

Laptop → [0.65, -0.12, 0.48]
Phone → [0.60, -0.15, 0.52]
Pen → [0.10, 0.85, -0.40]

Laptop and Phone vectors are close together.

Model-Derived Feature Vectors

Dense vectors specifically generated by models like CNN/Transformer as a vector. Mainly used with Computer Vision. Image classification, object detection, face recognition, voice processing.

Models generate them during feature extraction (training & inference).

Example: BERT sentence vector, ResNet image features.

Vector TypeWho designs it?Who computes it?When it’s computedExample
Tabular feature vectorML Eng/DS (choose columns)Pipeline codeTrain & Inference[Price, Weight, Warranty]
Sparse (One-Hot/TF-IDF)ML Eng/DS (choose encoder)Encoder in pipelineTrain (fit) & Inference (transform)One-Hot Product
Dense (scaled/PCA)ML Eng/DS (choose scaler/PCA)Scaler/PCA in pipelineTrain (fit) & Inference (transform)StandardScaled price, PCA(100)
Model features / EmbeddingsML Eng/DS (choose model)Model (pretrained or trained)Train & InferenceBERT/ResNet/categorical embedding

MLOps ensures the same steps run at inference to avoid train/serve skew.

Example of Dense Vector

python -m venv .densevector 
source .densevector/bin/activate 
pip install sentence-transformers
from sentence_transformers import SentenceTransformer

# Load a pre-trained model (MiniLM is small & fast)
model = SentenceTransformer('all-MiniLM-L6-v2')

text = "Laptop"

# Convert text into dense vector
vector = model.encode(text)

print("Dense Vector Shape:", text, vector.shape)
print("Dense Vector (first 10 values):", vector[:10])
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Words
texts = ["Laptop", "Computer", "Pencil"]

# Encode all
vectors = model.encode(texts)

# Convert to numpy array
vectors = np.array(vectors)

# Cosine similarity matrix
sim_matrix = cosine_similarity(vectors)

# Display similarity scores
for i in range(len(texts)):
    for j in range(i+1, len(texts)):
        print(f"Similarity({texts[i]} vs {texts[j]}): {sim_matrix[i][j]:.4f}")

#vectors #densevector #sparsevector #tabularvectorVer 0.3.6

Last change: 2025-12-02