svndump-utils progress

My project svndump-utils is getting really well and will hopefully be finished for next week.

The initial goal describe in the wiki specification page has been fullfilled "in spirit". The real goal is to be able to extract Subversion projects and feed them to tools like git-svnimport (or the future svn2darcs scripts). Of course, this transformation must be done without losing changes history (otherwise there won't be any interest in doing so).

Svndump-utils is made around some strong idea:

history: extraction of liveness (add/remove) and copy (copyfrom) for every node (file/dir). This helps to understand what is alive at which revision. It also defines entry points, e.g. extract project which is under project1@32, meaning project1/ at revision 32. This is processed as a standard graph.
filter: stacked iterator over svn dump record stream. This structure represents basic operation that can be performed on svn dump file. It should provides a next_record function and computed history of the stream. In most case, you compute a new history by applying some graph processing on a clone of the previous filter history and provide it to next filter. Based on this history, the filter has just to remove node which are not alive in its own history. For now, provided filter are:
- Load: read a svn dump file (first filter)
- Save: save a svn dump file (last filter)
- Include: include a specific node and everything connected to it (copy, children, parent)
- Exclude: exclude a specific node and everything connected to it (copy, children)
- DropEmptyRev: remove empty revision from the stream
- Reparent: given a specific node, make all connected nodes be under the same node

Classical configuration of filter:

Load(svndump.file) -> Include(project1@32) -> Exclude(project1/test@34) -> DropEmptyRev -> Save(svndump-clear.file)

At the beginning, i wasn't having a strong feeling about my ability to compute history to provide next filter with. In fact, after having write this utils, this task was quite simple. It is only a matter of iterating through node... When classical algorithms can be applied computer science is much more simple!

Blog of Sylvain Le Gall

Search

Links

Debian

Family/Friend

OCaml