Yo! I am Jaideep (Jd) Chawla, aka jellybean ❄️ on x dot com.
Avg neural networks enjoyer. Feel free to go through stuff here.
- info
- recent posts
- 2024-07-13 Fine-Tuning is Not as Straightforward as I Thought
- 2024-05-14 Score-Based Generative Modeling with SDEs
- 0001-01-01 Very Markdown, Very Syntax
- rss
- PRO TIP
- never kill yourself
raw brain dump
synced to my obsidian logs. stuff that prolly doesn’t end up on x.
Note to self (2):
MAKE SURE SAVING CHECKPOINTS WORKS UNLESS YOU WANT TO WASTE 3HRS OF A TRAINING RUN
Note to self:
NEVER INSTALL PYTORCH THROUGH CONDA UNLESS YOU WANT TO WASTE 2.5HRS OF YOUR LIFE
pip is the goat
0122
new year, two months of sickness (physical and mental) we are so back
1128
these are getting more and more infrequent.
only key takeaway from last 10 days is to use managed/unified memory with cuda kernels until you understand how memory works…
avoid cudaMemcpyAsync
and cudaMemsetAsync
for the time being…
1117
time surely flies when you are down with flu… https://github.com/assafelovic/gpt-researcher can use some automatic prompt optimization…
1102
not writing cursive with a fountain pen is chaotic evil.
1029
man, DSPy is awesome, at least now i feel as if i am not getting overpaid for just writing prompts.
1028
deconstructing a search engine, let’s stick to a domain first (medical for e.g.)
- there are multiple medical databases (PubMed, SemanticScholar, etc…)
- each have a traditional search and their own api
- now you have “agent” that can crawl these databases, figure out how to query it
- and you need a model/workflow to transform a natural query to an api query.
- do you just tailor prompts for each database, kinda cring.
- what about automated prompt engineering, DSPy??
- search engines that leverage DSPy
1026
i got work do to but it’s saturday night so
1024
training curve btw, should have used early stopping (figured it’s just one epoch).
1023
ok i’ll bite, it’s rag for medical domain (the over-engineered gpt wrapper over a couple of sources) and i must research/experiment to make it better i suppose
other priorities today:
- applying to openai residency, preceded by embellishing my cv
- i’ll do the visa tomorrow
- working on cola and starting cuda stuff
switching gears to doing some query optimization/prompt engineering stuff; should line up well with the blog i’ve been trying to write for over a month
maybe if do end up learning somewhat about diffusion and vlms, might become capable enough to work at moondream.
1022
dse-qwen2-korean should be ready in about 19 hours…burning a 4xA100 for 30hrs lemao
log-scale normalization is better than clipping?
now we’re doing feature-wise weighting before computing cosine similarity, which means features with higher variance in the training set will contribute more to the similarity calculation. it maintains the normalization benefits of cosine similarity while incorporating the feature importance weights
have to experiment with different normalization strategies for the feature weights: basic, softmax-based, min-max, log-scale.
also have to see how to see how (sighs) FID scores change.
1021
i am trying
so what am i working on right now, well, a bunch of things:
- at work i was extensively pushing to use multimodal rag based on colpali, and so we compromised for dse; finetuning should be straightforward, hoping it works out well
- all you need then is this encoder + claude.
now, onto more interesting things:
- learning CUDA by writing kernels for compositional linear algebra https://github.com/wilson-labs/cola
- improving the guided diffusion technique proposed in https://github.com/Agentic-Learning-AI-Lab/procreate-diffusion-public
this in itself is too much work on my plate but i also have teaching linear algebra and classes (obviously)
good luck me
1019
computers understand bits, llms understand vectors
i might actually cook with this blog…
all you need to learn diffusion is
- https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
- https://yang-song.net/blog/2021/score/
estimated reading time for them combined is 78 mins
you can be an expert at diffusion in under 2 hours
that’s less than 2 weeks btw
diffusion models are a journey that’s equal parts math, magic, and machine learning.
1017
i always don’t use version control but when i do i spam a bunch of Update README.md
1015
2 hours of sleep, on a sugar rush, and forcing myself through a lecture on approximate inference. i think i should just post that lmao.
next 72 hours are going to be brutal
so i’ve been using vim
- i have to learn cuda
- i have to relearn vector calc so that i can attempt this horrifying assignment on bayesian linear regression
- have to train a 2b vlm
- proctor a couple of exams
and
if
time
permits
work on two research projects
am i too hopeful or thoroughly cooked