SitesLinksOttawaLifePhotosTravelToolsJournalBlog
See More Stuff
Thursday, August 8, 2013

Shared Library Symbol Conflicts (on Linux)

I don't work with shared libraries very often, but I came across a shocking problem in a legacy product. Basically because the application and the shared library both defined the same symbol (i.e. function) and because no effort was made to control what the .so exported, the wrong symbol was used at runtime. This turns out to be consistently reproducible and avoidable by setting up a symbol barrier when creating the shared object file. The tools you need are described on gnu.org. An in depth explanation is available on akkadia.org. See also the Solaris Solution.

Symbol Conflicts

Summary of Switches

g++
    -c <file>  Compile the source file, but do not link. Output is in the form of an
               object file.
    -o <file>  Specify the name of the output file.
    -s         Remove all symbol table and relocation information from the executable.
    -L<dir>    Add to the list of directories to be searched for libraries.
    -l<lib>    Search the named library when linking. The linker searches and 
               processes libraries and object files in the order they are specified. 
               The linker searches for a file named lib<lib>.a in system directories
               plus any that you specify with -L.
    -shared    Produce a shared object. For predictable results you must speicy the 
               same set of options that were used to generate code (-fpic, -fPIC, or 
               model suboptions).

The Problem

Define a simple worker.

work.cxx

#include <iostream>
void DoThing()
{
  printf("work \n");
}

Define a simple application.

main.cxx

#include <iostream>
void DoThing();

int main()
{
  printf("start \n");
  DoThing();
  printf("finished \n");
  return 0;
}

Compile the worker into an object file and wrap it as a static lib.

g++ -c work.cxx -o work.o
ar rc libwork.a work.o

Compile the application into an object file and wrap it as a static lib.

g++ -c main.cxx -o main.o
ar rc libmain.a main.o

Link the static libs into an executable. Note that the linker only looks further down the line when looking for symbols used by but not defined in the current lib. Thus the lowest level libs should go right-most and the left most symbol will be used when conflicts occur.

g++ -s -L. -o main.exe -lwork -lmain

./libmain.a(main.o): In function `main':
main.cxx:(.text+0x90): undefined reference to `DoThing()'
collect2: ld returned 1 exit status

Link successfully.

g++ -s -L. -o main.exe -lmain -lwork

./main.exe

start
work
finished

Define a simple conflict.

conflict.cxx

#include <iostream>
void DoThing()
{
  printf("conflict \n");
}

Compile the conflict into an object file and wrap it as a static lib.

g++ -c conflict.cxx -o conflict.o
ar rc libconflict.a conflict.o

Link (without warning) with priority given to work.

g++ -s -L. -o main.exe -lmain -lwork -lconflict

./main.exe

start
work
finished

Link (without warning) with priority given to conflict.

g++ -s -L. -o main.exe -lmain -lconflict -lwork

./main.exe

start
conflict
finished

Instead of a static library, package conflict as a shared library.

rm libconflict.a
g++ -shared conflict.o -o libconflict.so

Link for simple use of the shared library. In this case -lconflict refers to libconflict.so instead of libconflict.a

g++ -s -L. -o main.exe -lmain -lconflict

export LD_LIBRARY_PATH=.
./main.exe

start
conflict
finished

Now, introduce a new layer to call the conflict within the shared library.

layer.cxx

#include <iostream>
void DoThing();

void DoLayer()
{
  printf("layer \n");
  DoThing();
}

g++ -c layer.cxx -o layer.o

Link the shared library.

g++ -shared layer.o conflict.o -o libconflict.so

And update the application to call they layer instead of calling the conflict directly.

main.cxx

#include <iostream>
void DoLayer();

int main()
{
  printf("start \n");
  DoLayer();
  printf("finished \n");
  return 0;
}

Compile, link, execute.

g++ -c main.cxx -o main.o
ar rc libmain.a main.o
g++ -s -L. -o main.exe -lmain -lconflict

export LD_LIBRARY_PATH=.
./main.exe

start
layer
conflict
finished

Now re-introduce the worker to observe the conflict.

main.cxx

#include <iostream>
void DoThing();
void DoLayer();

int main()
{
  printf("start \n");
  DoThing();
  DoLayer();
  printf("finished \n");
  return 0;
}

Compile, link, execute.

g++ -c main.cxx -o main.o
g++ -c work.cxx -o work.o
ar rc libmain.a main.o work.o
g++ -s -L. -o main.exe -lmain -lconflict

export LD_LIBRARY_PATH=.
./main.exe

start
work
layer
work
finished

Because work.o exists in libmain.a the DoThing() call from main() correctly executes from work.o, but surprisingly the DoThing() call from layer.o also executes from work.o instead of from conflict.o as might have been expected.

The Solution

How might we prevent code within our shared library from calling symbols within our application? Changing the link order reverses the problem.

g++ -s -L. -o main.exe -lconflict -lmain

export LD_LIBRARY_PATH=.
./main.exe

start
conflict
layer
conflict
finished

Note the difference between linking with static and shared libraries. Had conflict been a static library then because it didn't come after main in the link line, linking would have failed with "undefined reference to DoLayer()".

rm libconflict.so
ar rc libconflict.a layer.o conflict.o
g++ -s -L. -o main.exe -lconflict -lmain

./libmain.a(main.o): In function `main':
main.cxx:(.text+0x95): undefined reference to `DoLayer()'
collect2: ld returned 1 exit status

And as expected, the reverse order for static libraries means the left-most symbol is used by all parties.

g++ -s -L. -o main.exe -lmain -lconflict

./main.exe

start
work
layer
work
finished

But this difference presents a solution. If we use the shared library link order that gives preference to symbols from the shared library and we hide those symbols from the application then we get the desired behaviour.

rm libconflict.a
g++ -shared layer.o conflict.o -o libconflict.so

Command Usage:

nm

 -C  Decode (demangle) low-level symbol names into user-level names.
 -D  Display the dynamic symbols rather than the normal symbols, which is 
     only meaningful for shared libraries.
 -g  Display only external symbols.
 -a  Display all symbols, even debugger-only symbols.
 
Symbol Types

 t  - symbol is in the text (code) section              - local
 T  - symbol is in the text (code) section              - global/external
 U  - symbol is undefined                               - global/external
 b  - symbol is in the uninitialized data (BSS) section - local
 w  - symbol is a weak symbol that has not been specifically tagged as a weak symbol
 A  - symbol value wont be changed by further linking   - global/external

Looking at the results.

nm -C layer.o

00000046 t global constructors keyed to _Z7DoLayerv
00000000 t __static_initialization_and_destruction_0(int, int)
00000072 T DoLayer()
         U DoThing()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
00000000 b std::__ioinit
         U __cxa_atexit
         U __dso_handle
         U __gxx_personality_v0
0000005e t __tcf_0
         U puts

nm -C conflict.o

00000046 t global constructors keyed to _Z7DoThingv
00000000 t __static_initialization_and_destruction_0(int, int)
00000072 T DoThing()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
00000000 b std::__ioinit
         U __cxa_atexit
         U __dso_handle
         U __gxx_personality_v0
0000005e t __tcf_0
         U puts

nm -CD libconflict.so

         w _Jv_RegisterClasses
0000068e T DoLayer()
0000071a T DoThing()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
00001a00 A __bss_start
         U __cxa_atexit
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0
00001a00 A _edata
00001a10 A _end
00000764 T _fini
000004e0 T _init
         U puts

Thus both DoThing() and DoLayer() are exported by our shared library, but we don't want DoThing() exported.

Using -fvisibility=hidden during the packaging of the .so has no effect.

g++ -shared -fvisibility=hidden layer.o conflict.o -o libconflict.so

nm -CD libconflict.so

         w _Jv_RegisterClasses
0000068e T DoLayer()
0000071a T DoThing()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
00001a00 A __bss_start
         U __cxa_atexit
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0
00001a00 A _edata
00001a10 A _end
00000764 T _fini
000004e0 T _init
         U puts

It's not immedately obvious, but using -fvisibility=hidden during compile will effect the final .so

g++ -fvisibility=hidden -c layer.cxx -o layer.o
g++ -fvisibility=hidden -c conflict.cxx -o conflict.o 
g++ -shared layer.o conflict.o -o libconflict.so

nm -C layer.o

00000046 t global constructors keyed to _Z7DoLayerv
00000000 t __static_initialization_and_destruction_0(int, int)
00000072 T DoLayer()
         U DoThing()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
00000000 b std::__ioinit
         U __cxa_atexit
         U __dso_handle
         U __gxx_personality_v0
0000005e t __tcf_0
         U puts

nm -C conflict.o

00000046 t global constructors keyed to _Z7DoThingv
00000000 t __static_initialization_and_destruction_0(int, int)
00000072 T DoThing()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
00000000 b std::__ioinit
         U __cxa_atexit
         U __dso_handle
         U __gxx_personality_v0
0000005e t __tcf_0
         U puts

As far as we can tell the -fvisibility=hidden switch did not effect the produced .o files, but it has changed the resulting .so, specifically it has omitted the DoLayer and DoThing symbols.

nm -CD libconflict.so

         w _Jv_RegisterClasses
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
000019b0 A __bss_start
         U __cxa_atexit
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0
000019b0 A _edata
000019c0 A _end
00000714 T _fini
00000490 T _init
         U puts

The result is that our application can't compile because we have excluded all symbols.

g++ -s -L. -o main.exe -lconflict -lmain

./libmain.a(main.o): In function `main':
main.cxx:(.text+0x95): undefined reference to `DoLayer()'
collect2: ld returned 1 exit status

We must explicitly allow the subset of symbols that we want to export.

layer.cxx

#include <iostream>
void DoThing();

__attribute__ ((visibility ("default"))) void DoLayer()
{
  printf("layer \n");
  DoThing();
}

Compile and link.

g++ -fvisibility=hidden -c layer.cxx -o layer.o
g++ -fvisibility=hidden -c conflict.cxx -o conflict.o 
g++ -shared layer.o conflict.o -o libconflict.so

nm -CD libconflict.so

         w _Jv_RegisterClasses
0000065e T DoLayer()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
000019d0 A __bss_start
         U __cxa_atexit
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0
000019d0 A _edata
000019e0 A _end
00000734 T _fini
000004b8 T _init
         U puts

g++ -s -L. -o main.exe -lconflict -lmain

export LD_LIBRARY_PATH=.
./main.exe

start
work
layer
conflict
finished

That's good, but we can do better. Consider the details on gcc.gnu.org and stackoverflow.

We might use -s to exclude the debugging symbols, but they already don't appear in our shared library, so it apears to make no difference.

nm -CDa libconflict.so

         w _Jv_RegisterClasses
0000065e T DoLayer()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
000019d0 A __bss_start
         U __cxa_atexit
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0
000019d0 A _edata
000019e0 A _end
00000734 T _fini
000004b8 T _init
         U puts

nm -Ca layer.o

00000000 b .bss
00000000 n .comment
00000000 d .ctors
00000000 d .data
00000000 r .eh_frame
00000000 n .note.GNU-stack
00000000 r .rodata
00000000 t .text
00000046 t global constructors keyed to _Z7DoLayerv
00000000 t __static_initialization_and_destruction_0(int, int)
00000072 T DoLayer()
         U DoThing()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
00000000 b std::__ioinit
         U __cxa_atexit
         U __dso_handle
         U __gxx_personality_v0
0000005e t __tcf_0
00000000 a layer.cxx
         U puts

Adding the -a switch didn't show any new symbols and compiling with the -s switch doesn't reduce the symbol count.

g++ -s -fvisibility=hidden -c layer.cxx -o layer.o

nm -Ca layer.o

00000000 b .bss
00000000 n .comment
00000000 d .ctors
00000000 d .data
00000000 r .eh_frame
00000000 n .note.GNU-stack
00000000 r .rodata
00000000 t .text
00000046 t global constructors keyed to _Z7DoLayerv
00000000 t __static_initialization_and_destruction_0(int, int)
00000072 T DoLayer()
         U DoThing()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
00000000 b std::__ioinit
         U __cxa_atexit
         U __dso_handle
         U __gxx_personality_v0
0000005e t __tcf_0
00000000 a layer.cxx
         U puts

Certainly it doesn't harm to us -s and as suggested we probably want to also use -fvisibility-inlines-hidden

g++ -fvisibility=hidden -fvisibility-inlines-hidden -s -c layer.cxx -o layer.o
g++ -fvisibility=hidden -fvisibility-inlines-hidden -s -c conflict.cxx -o conflict.o 
g++ -fvisibility=hidden -fvisibility-inlines-hidden -s -shared layer.o conflict.o -o libconflict.so

nm -CDa libconflict.so

         w _Jv_RegisterClasses
0000065e T DoLayer()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
000019d0 A __bss_start
         U __cxa_atexit
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0
000019d0 A _edata
000019e0 A _end
00000734 T _fini
000004b8 T _init
         U puts

g++ -s -L. -o main.exe -lconflict -lmain

export LD_LIBRARY_PATH=.
./main.exe

start
work
layer
conflict
finished

Note that this solution seems to eliminiate the earlier dependency on the link line order.

g++ -s -L. -o main.exe -lmain -lconflict

export LD_LIBRARY_PATH=.
./main.exe

start
work
layer
conflict
finished

That's all well and good, but it might be a burden to add the __attribute__ decorations to a large ancient codebase that suddenly needs to be used with another codebase but can't due to collisions problems.

stackoverflow describes how --version-script can be used instead of the __attribute__ decorations, but it only works in certain scenarios.

test.c

int foo() { return 1; }
int bar() { return 2; }

test.export

{
 global: bar;
 local: *;
};

Compile and link and inspect symbols.

gcc -c test.c -o test.o
gcc -Wl,--version-script=test.export -shared test.o -o test.so
nm -CDa test.so

         w _Jv_RegisterClasses
         w __cxa_finalize
         w __gmon_start__
00000316 T bar

That fails if g++ is used instead of gcc.

g++ -c test.c -o test.o
g++ -Wl,--version-script=test.export -shared test.o -o test.so
nm -CDa test.so

         w _Jv_RegisterClasses
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0

It also fails if the file is named .cxx instead of .c

cp test.c test.cxx

gcc -c test.cxx -o test.o
gcc -Wl,--version-script=test.export -shared test.o -o test.so
nm -CDa test.so

         w _Jv_RegisterClasses
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0

ftp.gnu.org only talks about c. sourceware.org talks about c++ but it's not clear what linker they're using. cygwin.com shows a more complicated syntax.

{
    global:
        extern "C++" {
           this_plugin*;
           *BaseLayer::*;
           *DPlugin;
           *BasePlugin;
           *MyPlugin::*;
           *MyLayer::*;
        };
    local:
        *;
};

Using * on both sides works, so the issue must be name mangling.

test.export

{
 global: *bar*;
 local: *;
};

Now we have success.

gcc -c test.cxx -o test.o
gcc -Wl,--version-script=test.export -shared test.o -o test.so
nm -CDa test.so

         w _Jv_RegisterClasses
00000336 T bar()
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0

nm -D test.so

         w _Jv_RegisterClasses
00000336 T _Z3barv
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0

You don't have to use ** if you provide the full mangled name.

test.export

{
 global: _Z3barv;
 local: *;
};

Compile and link.

gcc -c test.cxx -o test.o
gcc -Wl,--version-script=test.export -shared test.o -o test.so
nm -CDa test.so

         w _Jv_RegisterClasses
00000336 T bar()
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0

nm -D test.so

         w _Jv_RegisterClasses
00000336 T _Z3barv
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0

In this way it also works with g++.

g++ -c test.cxx -o test.o
g++ -Wl,--version-script=test.export -shared test.o -o test.so
nm -CDa test.so

         w _Jv_RegisterClasses
00000386 T bar()
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0

Returning to the original problem, you can compile without visibility flags and without source code decorations.

g++ -s -c layer.cxx -o layer.o
g++ -s -c conflict.cxx -o conflict.o 

Then find the mangled symbols.

nm -g layer.o

00000072 T _Z7DoLayerv
         U _Z7DoThingv
         U _ZNSt8ios_base4InitC1Ev
         U _ZNSt8ios_base4InitD1Ev
         U __cxa_atexit
         U __dso_handle
         U __gxx_personality_v0
         U puts

Then produce an export script.

export.txt

{
 global: _Z7DoLayerv;
 local: *;
};

Then construct a shared library that exports only the required API.

g++ -Wl,--version-script=export.txt -s -shared layer.o conflict.o -o libconflict.so
nm -CDa libconflict.so

         w _Jv_RegisterClasses
000005ce T DoLayer()
         U std::ios_base::Init::Init()
         U std::ios_base::Init::~Init()
         U __cxa_atexit
         w __cxa_finalize
         w __gmon_start__
         U __gxx_personality_v0
         U puts

Then compile your application.

g++ -s -L. -o main.exe -lconflict -lmain

export LD_LIBRARY_PATH=.
./main.exe

start
work
layer
conflict
finished

And you don't have to worry about order on the link line.

g++ -s -L. -o main.exe -lmain -lconflict

export LD_LIBRARY_PATH=.
./main.exe

start
work
layer
conflict
finished

Finally, if your exported symbols are sufficiently unique, instead of looking up the mangled names, you could use double asterix.

export.txt

{
 global: *DoLayer*;
 local: *;
};

With Classes

As described by akkadia.org and discovered during the Solaris Solution, the best practice is to use -fvisibility=hidden during compile from .cxx to .o with __attribute__ ((visibility ("default"))) in the source to indictate just those functions that outsiders will be allowed to call.

The problem with that is you might have to update hundreds of source files. If you're lucky and somone already did this for windows __declspec and you're extra luck and they did it wit a macro, then you might get away with a minor tweak like this. Unfortunately it's a bit more complicated on Solaris.

#ifndef MYDLLEXPORT
 #if defined(WIN32)
  #define MYDLLEXPORT __declspec(dllexport)
 #elif defined(linux)
  #define MYDLLEXPORT __attribute__ ((visibility ("default")))
 #else
  #define MYDLLEXPORT
 #endif
#endif

Let's start with a simple demo.

main.cxx

#include <iostream>
#include "layer.h"
void DoThing();
int main()
{
  printf("start \n");
  DoThing();
  MyClass m;
  m.DoLayer();
  printf("finished \n");
  return 0;
}

work.cxx

#include <iostream>
void DoThing()
{
  printf("work \n");
}

layer.h

class MyClass
{
public:
  void DoLayer();
};

layer.cxx

#include <iostream>
#include "layer.h"
void DoThing();
void MyClass::DoLayer()
{
  printf("layer \n");
  DoThing();
}

conflict.cxx

#include <iostream>
void DoThing()
{
  printf("conflict \n");
}

This demo shows the symbol conflict problem where they layer is implemented as a class.

g++ -c layer.cxx -o layer.o
g++ -c conflict.cxx -o conflict.o
g++ -shared layer.o conflict.o -o libconflict.so

g++ -c main.cxx -o main.o
g++ -c work.cxx -o work.o
ar rc libmain.a main.o work.o

g++ -s -L. -o main.exe -lmain -lconflict
export LD_LIBRARY_PATH=.
./main.exe
start
work
layer
work
finished

Now lets solve the problem with the -fvisibility=hidden and source __attribute__ method.

layer.cxx

include <iostream>
#include "layer.h"
void DoThing();
__attribute__ ((visibility ("default"))) void MyClass::DoLayer()
{
  printf("layer \n");
  DoThing();
}
g++ -fvisibility=hidden -c layer.cxx -o layer.o
g++ -fvisibility=hidden -c conflict.cxx -o conflict.o
g++ -shared layer.o conflict.o -o libconflict.so
g++ -s -L. -o main.exe -lmain -lconflict
./main.exe
start
work
layer
conflict
finished
nm -C libconflict.so | grep Layer
00000642 t global constructors keyed to _ZN7MyClass7DoLayerEv
0000066e T MyClass::DoLayer()

The above demonstrates that putting the default attribute on a class function implementation solves the symbol conflict.

But can we do the same from the header?

layer.cxx

include <iostream>
#include "layer.h"
void DoThing();
void MyClass::DoLayer()
{
  printf("layer \n");
  DoThing();
}

layer.h

class MyClass
{
public:
  __attribute__ ((visibility ("default"))) void DoLayer();
};
g++ -fvisibility=hidden -c layer.cxx -o layer.o
g++ -fvisibility=hidden -c conflict.cxx -o conflict.o
g++ -shared layer.o conflict.o -o libconflict.so
g++ -s -L. -o main.exe -lmain -lconflict
./main.exe
start
work
layer
conflict
finished
nm -C libconflict.so | grep Layer
00000642 t global constructors keyed to _ZN7MyClass7DoLayerEv
0000066e T MyClass::DoLayer()

The above demonstrates that putting the default attribute on a class function declaration also solves the symbol conflict.

But can we do the same from the entire class?

layer.h

class __attribute__ ((visibility ("default"))) MyClass
{
public:
  void DoLayer();
};
g++ -fvisibility=hidden -c layer.cxx -o layer.o
g++ -fvisibility=hidden -c conflict.cxx -o conflict.o
g++ -shared layer.o conflict.o -o libconflict.so
g++ -s -L. -o main.exe -lmain -lconflict
./main.exe
start
work
layer
conflict
finished
nm -C libconflict.so | grep Layer
00000642 t global constructors keyed to _ZN7MyClass7DoLayerEv
0000066e T MyClass::DoLayer()

Super. It looks like a simple fix. Just add the __attribute__ to a few hundred classes and you're good to go. And if your classes already look like the following, then you're just looking at a one file change

class MYDLLEXPORT MyClass
{
public:
  void MyFunc();
};