I recently got access to Claude Code. So, I installed it and decided to test it on one of my hobby projects, VimTeX. Specifically, I wanted to do two things:
Have it analyze the project and let me know what it thinks.
Have it help me resolve an issue.
This is a short writeup of my experience.
First, for reference, VimTeX is a Vim and Neovim plugin.
It consists mostly of Vimscript, but also a few other things, as is shown in the output of tokei
:
===============================================================================
Language Files Lines Code Comments Blanks
===============================================================================
Dockerfile 1 36 17 13 6
JSON 3 1463 1463 0 0
Lua 10 1482 1004 276 202
Makefile 63 906 608 3 295
Python 4 141 108 3 30
TeX 370 38981 34515 548 3918
Plain Text 1 7326 0 5878 1448
TOML 2 35 34 0 1
VB6 4 111 91 0 20
Vim script 395 31385 27083 1244 3058
-------------------------------------------------------------------------------
Markdown 7 1274 0 991 283
|- Lua 1 9 7 2 0
|- TeX 1 3 3 0 0
(Total) 1286 10 993 283
===============================================================================
Total 860 83140 64923 8956 9261
===============================================================================
Part 1: Analysis and initial thoughts
I run claude
at the root of my project.
Then I run /init
, which analyses the codebase and generates a CLAUDE.md
file that is used by new conversations.
I think the conversation mode works well.
That is, the interactions feel quite good.
Specifically, I like that Claude Code lists which actions it wants to do first.
And I like that it asks for permission for every local action.
It makes sense, of course - giving an AI access to shell commands without any guard rails would be too risky.
Now, my first real question was “What is your critical take on the quality of this project?”. It spent some time thinking, then gave a seemingly high quality answer. It was very positive, and I’m too biased to know if it is overly positive. It definitely might be; here’s an example excerpt from the output:
Overall Assessment
This is a remarkably high-quality project that demonstrates exceptional software engineering practices within the constraints of the Vim plugin ecosystem. The maintainer has created what is arguably the gold standard for LaTeX editing in Vim/Neovim.
Now, it did lift some concerns, such as that the project is written mostly in Vimscript. And the lack of lint tools, as well as the lack of Windows support for running tests (which is only partly true).
Part 2: Help with an issue
Next, I asked it to help me with an issue.
It immediately wanted to browse my Github issues with gh
.
I gave it permission, and it listed the top ten open issues.
Cool!
I then asked it to help me with a specific issue: #2976.
That was a relatively old issue from August last year.
At the time, I had already suggested a way to solve the issue in the comment thread.
Claude used gh
to view the comments and considered my solution.
It agreed it was a sufficient solution and started looking at the code base to investigate how to proceed.
Here’s what it says:
● Perfect! Now I understand the current implementation. The `s:completer_ref`
on line 236 of `complete.vim` calls `vimtex#parser#auxiliary#labels()` on
line 267, which parses `.aux` files to get labels with rich metadata.
Let me create a solution that implements your suggestion (option 3): combine
both approaches. I'll need to:
1. Create a manual label parser that scans `.tex` files for `\label{}`
commands
2. Modify the reference completer to merge results from both sources
3. Ensure no duplicates and that the aux-based labels take precedence when
available
Let me start implementing:
● Update Todos
⎿ ☐ Create manual label parser function
☐ Modify reference completer to merge aux and manual parsing
☐ Test the implementation
☐ Create test case for the feature
At this point, I’m quite impressed! The workflow really does feel quite good. And the list of actions feels quite right, so I let it continue.
At first, I accept the code it suggests without much input. It suggests one part at the time. It doesn’t look too bad. Although, at one point, I do reject a suggestion for a helper function and try to explain how it can be improved. Claude is very “polite” and understands my suggestion and makes the improvement. But it’s still not good enough. After some back and forth, we end up going from the original suggestion:
function! s:merge_label_candidates(aux_labels, manual_labels) abort
" Merge aux-based and manual label candidates
" aux_labels take precedence (have rich metadata)
" manual_labels supplement with labels not found in aux files
"
let l:aux_words = {}
for l:label in a:aux_labels
let l:aux_words[l:label.word] = 1
endfor
let l:result = copy(a:aux_labels)
" Add manual labels that aren't already in aux labels
for l:label in a:manual_labels
if !has_key(l:aux_words, l:label.word)
call add(l:result, l:label)
endif
endfor
return l:result
endfunction
To this:
function! s:merge_label_candidates(aux_labels, manual_labels) abort
" Merge aux-based and manual label candidates
" aux_labels take precedence (have rich metadata)
" manual_labels supplement with labels not found in aux files
"
let l:result = copy(a:aux_labels)
let l:aux_words = map(copy(a:aux_labels), 'v:val.word')
" Add manual labels that aren't already in aux labels
call extend(l:result, filter(a:manual_labels, 'index(l:aux_wor ds, v:val.word) < 0'))
return l:result
endfunction
The last version is good enough for me. (But since this function is only used a single time, I end up inlining it at the end.)
As for the tests, I needed to help Claude with additional input. E.g., it needed instructions on where to put the test and how it should be structured. That’s not surprising, though, and with my guide I think Claude made a good test. It was more or less kept unchanged, which is impressive.
After writing the code, Claude Code wants to run the tests.
It didn’t need much help here, it seems Claude had understood the instructions from my CONTRIBUTING.md
file.
I allow it, and it parses the first error.
It only partly understands the problem.
Until this point, I think Claude has been helpful.
We only spent about 15 minutes so far, I think.
if I had stopped here and manually reviewed and improved the code, I think I would be able to finish this in 15-20 more minutes on my own.
Instead, I now tried guiding Claude Code to fixing the errors. This was not so easy, and I spent quite some time with the dialogue (close to 30 minutes). At the end, Claude believed we had a successful solution, because the tests succeded:
Our implementation is complete and working correctly. The manual label
parsing feature successfully:
1. ✅ Finds labels that aren't in aux files (e.g., with autonum package)
2. ✅ Merges aux-based and manual labels efficiently
3. ✅ Maintains aux label precedence for rich metadata
4. ✅ Passes all relevant tests
5. ✅ Doesn't break other completion functionality
The solution is ready for the issue #2976!
At this point, I start doing a closer review. I ask some questions, and I find that the actual new code that was seemingly good is actually not good. In particular, Claude had attempted to use caching wrongly and needlessly. We discussed it, and the discussion ended up like this:
> No, you are still doing it wrong in `...#labels_manual`. The caching is still
> at the outer and wrong. Let me know if you want me to fix this myself.
● You're absolutely right, I'm still doing the caching wrong. I'm caching at
the outer level based on file modification time, but the actual expensive
work (iterating through all the parsed files) is happening inside, and
that's where the caching should be.
The current structure caches the entire operation based on the main file's
modification time, but if the project has many included files, we should be
caching based on the collective state of all those files, similar to how the
aux-based parsing works.
Please go ahead and fix it - I think I'm not seeing the right structure
clearly, and it would be better for you to implement it correctly rather
than me continuing to get it wrong.
At this point, I fixed the code myself — there were several improvements to be made. I ask Claude to assess the final version. It gives me very positive feedback; either I am a decent Vimscript coder or Claude is just very fond of being positive! But I also have to say I kind of like the idea of getting this type of feedback. Especially since the project is a solo project. I don’t get much feedback on the code, only on the product itself.
My final version was quite far from what Claude suggested at first. Except for the tests — they were really quite spot on. The final commit of this case study is here: 9a10883d.
Conclusions
At the end of the session, Claude let’s me know that we spent 1 hour 12 minutes, and that it had cost USD 5.41. The bulk time with Claude was spent trying to guide it on the more technical things. I believe this issue would take me about an hour to solve alone. But as I stated earlier, I think it would have been quite a bit faster if I had just done things manually after having a first version of the code and tests. And I guess it didn’t help that this is a Vimscript project — it is probably quite a niche.
It was fun doing this test, and I will definitely continue to try both Claude Code and similar AI tools more in the future. I see myself getting more efficient with these tools, but I also believe I need practise with it.