If you do the exact same data import with the program in the same initial state, will it segfault again? (You may need to add code to record e.g. the contents of your data files prior to the import and what exactly is being imported).
Or, if you start in a known state and run fifty data imports in a row, can you get it to segfault?
Arc is not supposed to be able to segfault MzScheme. To MzScheme, Arc is just a big program written in Scheme. And MzScheme running Scheme programs is not supposed to segfault.
(If you aren't using the C foreign function interface, of course. A messed up pointer can cause unrelated code to blow up later. So the above paragraph is true if you are only using libraries which are either written in Scheme [and/or Arc, since Arc is written in Scheme] or part of the official MzScheme distribution).
It's true that the PLT folks certainly don't want to be debugging your Arc program, but they do want to be able to get MzScheme to work. And the way to enable them to do that is to give them a test case that they can run to see the segfault. If you can. Even if it is thousands of lines of code and megabytes of data. If you give them a shell script "diedie" that runs a MzScheme program (that runs Arc and loads your program and imports your data) that segfaults, then you make it possible for them to find out what is causing the segfault.
"If you aren't using the C foreign function interface, of course."
Yeah that phase actually performs stemming as well using the FFI. For a long time that was my prime suspect, but I have never been able to get stemming to segfault in isolation, either within arc or from a C program. And I have been able to get my program to segfault without the stemming.
You're right, I'll take another stab at a test case. And I'll not try so hard to make it tiny.