jd Yo! I am Jaideep (Jd) Chawla, aka jellybean ❄️ on x dot com.
Avg neural networks enjoyer. Feel free to go through stuff here.

raw brain dump

synced to my obsidian logs. stuff that prolly doesn’t end up on x.


Note to self (2):
MAKE SURE SAVING CHECKPOINTS WORKS UNLESS YOU WANT TO WASTE 3HRS OF A TRAINING RUN

Note to self:
NEVER INSTALL PYTORCH THROUGH CONDA UNLESS YOU WANT TO WASTE 2.5HRS OF YOUR LIFE
pip is the goat


0122

new year, two months of sickness (physical and mental) we are so back

1128

these are getting more and more infrequent. only key takeaway from last 10 days is to use managed/unified memory with cuda kernels until you understand how memory works…
avoid cudaMemcpyAsync and cudaMemsetAsync for the time being…

1117

time surely flies when you are down with flu… https://github.com/assafelovic/gpt-researcher can use some automatic prompt optimization…

1102

not writing cursive with a fountain pen is chaotic evil.

1029

man, DSPy is awesome, at least now i feel as if i am not getting overpaid for just writing prompts.

1028

deconstructing a search engine, let’s stick to a domain first (medical for e.g.)

1026

i got work do to but it’s saturday night so

1024

training curve btw, should have used early stopping (figured it’s just one epoch).

1023

ok i’ll bite, it’s rag for medical domain (the over-engineered gpt wrapper over a couple of sources) and i must research/experiment to make it better i suppose
other priorities today:


switching gears to doing some query optimization/prompt engineering stuff; should line up well with the blog i’ve been trying to write for over a month


maybe if do end up learning somewhat about diffusion and vlms, might become capable enough to work at moondream.

1022

dse-qwen2-korean should be ready in about 19 hours…burning a 4xA100 for 30hrs lemao


log-scale normalization is better than clipping?


now we’re doing feature-wise weighting before computing cosine similarity, which means features with higher variance in the training set will contribute more to the similarity calculation. it maintains the normalization benefits of cosine similarity while incorporating the feature importance weights
have to experiment with different normalization strategies for the feature weights: basic, softmax-based, min-max, log-scale.
also have to see how to see how (sighs) FID scores change.

1021

i am trying

so what am i working on right now, well, a bunch of things:

now, onto more interesting things:

this in itself is too much work on my plate but i also have teaching linear algebra and classes (obviously)

good luck me

1019

computers understand bits, llms understand vectors

i might actually cook with this blog…


all you need to learn diffusion is

estimated reading time for them combined is 78 mins
you can be an expert at diffusion in under 2 hours
that’s less than 2 weeks btw


diffusion models are a journey that’s equal parts math, magic, and machine learning.

1017

i always don’t use version control but when i do i spam a bunch of Update README.md

1015

2 hours of sleep, on a sugar rush, and forcing myself through a lecture on approximate inference. i think i should just post that lmao.


next 72 hours are going to be brutal
so i’ve been using vim

and
if
time
permits
work on two research projects

am i too hopeful or thoroughly cooked