Page tree

Importing Particular Files or Directories Only

Excluding files is extremely useful feature, but in some occasions it worth to include certain needed files, rather than exclude all the rest - e.g. you have some big SVN project with source code files, compiled binaries, libraries, installation packs, resources, documentation etc., all them are spread everywhere across the repository, but you want to have only source code files in you new Git repository. That's the case the includePath feature can solve.

Suppose, you have the following SVN project layout:

/repository
         /project
            /trunk
                /bin
                    *.exe
                    *.dll
                /source
                    *.cpp
                    *.cs
                /docs
                    *.html
                    *.txt
                *.exe
                *.msi
                *.cpp
                *.cs
                *.txt
            /branches
                …
            /tags
                …

There are several branches and tags and each of them has the structure similar to that trunk has - i.e. all the subdirectories trunk has and files of the same types. You want only source code files to be present in new Git repository - that is, every source code file from everywhere across the project and all the directories that contain source code files should be sent to Git, but no more. This can be done by the following configuration:

includePath = *.cpp
    includePath = *.cs

After SubGit finishes import (or mirror) to Git, master Git branch (supposing trunk was mapped to master) will have the following layout:

/source
        *.cpp
        *.cs
    *.cpp
    *.cs

and that is exactly what we wanted to perform.

Note, that we didn't explicitly tell SubGit to remove bin and docs directories - and in fact, it didn't. Both those directories became empty since there were no included .cpp and \*.cs files in there, and that leads them to be deleted since Git cannot track empty directories. It's kind of a hack - we have not to worry about which directories to be kept since with such configuration every directory containing included files will be preserved. That is, if we have \*.cpp and \*.cs files not only in *source directory, but, say, in code** too:

/repository
         /project
            /trunk
                /bin
                    *.exe
                    *.dll
                /source
                    *.cpp
                    *.cs
                /code
                    *.cpp
                    *.cs
                /docs
                    *.html
                    *.txt
                *.exe
                *.msi
                *.cpp
                *.cs
                *.txt
            /branches
                …
            /tags
                …

then after the import finishes Git master will look like this:

/source
        *.cpp
        *.cs
    /code
        *.cpp
        *.cs
    *.cpp
    *.cs

But what if we need to preserve source directory with all its files, but we don't want to import code?

That's another story that requires more configuration. In addition to includePath directives, we need to explicitly tell SubGit to exclude unneeded directory by excludePath directive, so final configuration will look like this:

excludePath = /code/**
    includePath = *.cpp
    includePath = *.cs

Code directory is set to be excluded explicitly, all the rest unneeded directories will be removed since they do not contain included files, so we receive intended outcome - source directory and .cpp and \*.cs* files only.

Let's go further and suppose we have several directories that contain files of included types and we want to include just some of them, e.g. we have the following SVN project:

/repository
         /project
            /trunk
                /bin
                    *.cpp
                    *.cs
                    *.exe
                    *.dll
                /lib
                    *.dll
                    *.so
                    *.cpp
                    *.cs
                /framework
                    *.cpp
                    *.cs 
                    *.py
                /source
                    *.cpp
                    *.cs
                /code
                    *.py
                    *.cpp
                    *.cs
                /docs
                    *.html
                    *.txt
                *.exe
                *.msi
                *.cpp
                *.cs
                *.txt
            /branches
                …
            /tags
                …

Say, we want to import source and code, but left all the rest directories behind. The first thought that comes in mind is to exclude all the directories and set those we need to be included - that is, set the configuration like this:

excludePath = /*/**
    includePath = /source/**
    includePath = /code/**
    includePath = *.cpp
    includePath = *.cs

Regretfully, it won't work as we might intend. Such configuration will exclude all the directories and won't include any: excludePath has a priority over includePath when their areas intersect. And thus we need to exclude all the unneeded directories excplicitly:

excludePath = /bin/**
    excludePath = /lib/**
    excludePath = /framework/**
    includePath = /source/**
    includePath = /code/**
    includePath = *.cpp
    includePath = *.cs

Note, that we may omit those directories that don't contain files of included types - they will become empty and excluded automatically by Git - thus we have to explicitly mention only those directories that contain included files and, at the same time, that we want to exclude.

Being used together excludePath and includePath provide extremely flexible capabilities to configure what to be imported and what to be left in SVN. The only nuance is that excludePath has a priority and it has to be accounted during configuration: sometimes it causes explicit directives to be set for many SVN entities.

  • No labels