[EN] Privacy advice: Please tick the box below “Private profile” at the bottom of your personal settings page, or any user can see what you did when!
[DE] Diese GitLab-Instanz beachtet die DSGVO, aber 1 Einstellung können Sie nur selbst ändern: Settings → Profile → ✓Private profile (Info)

Commit 87f35c04 authored by Benjamin Paaßen's avatar Benjamin Paaßen

added all source code and all JSON tree data. Also extended the README file...

added all source code and all JSON tree data. Also extended the README file with installation guidelines, the sklearn dependency, and a table of contents.
parent 363b124a
.ipynb_checkpoints
__pycache__
/build
*.c
*.so
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
......@@ -41,6 +41,16 @@ via backtracing ([Paaßen, 2018][1]).
These methods are implemented in `adversarial_edits.construct_random_adversarial`
and `adversarial_edits.construct_adversarial`, respectively.
## Installation and setup
To set up this package, you need to
1. install all dependencies listed below (except for `ptk`, which is enclosed),
2. run the command `python3 setup.py build_ext --inplace` to compile the cython
sources.
Then, every function should run.
## Reproduce the results in the paper
To reproduce the results presented in the paper, you merely have to run the
......@@ -70,9 +80,10 @@ D = multiprocess.pairwise_distances_symmetric(X, X)
As dependencies, this package requires [numpy](http://www.numpy.org/) for
general array handling, [scipy](https://scipy.org/) for eigenvalue decomposition
and statistical tests, [pytorch](https://pytorch.org/) for recursive neural
and statistical tests, [sklearn](https://scikit-learn.org/stable/) for
support vector machines, [pytorch](https://pytorch.org/) for recursive neural
networks, [cython](https://cython.org/) for fast tree edit distance
computations, and [ptk](http://www.joedsm.altervista.org/pythontreekernels.htm)
computations, and [ptk][2]
for tree kernel computations. Note that the latter package is not available via
pip and is written in Python2, such that we include an adapted Python3 version
here in the subfolder `ptk`.
......@@ -84,8 +95,54 @@ contained alongside this documentation is licensed under the
[GNU General Public License Version 3](https://www.gnu.org/licenses/gpl-3.0.en.html) license.
A copy of this license is contained in the `gpl-3.0.md` file alongside this README.
## Contents
The detailed contents of this package are the following:
* `adversarial_edits.py` : Implements adversarial edit attacks.
* `adversarial_edits_test.py` : Provides test functions for `adversarial_edits.py`.
* `cystic` : Contains the Cystic data set in JSON format.
* `Cystic.ipynb` : Contains the Cystic experiment.
* `gpl-3.0.md` : Contains the GPLv3 license.
* `hyperopt.py` : Implements hyper parameter optimization for SVM and tree echo
state networks.
* `hyperopt_test.py` : Provides test functions for `hyperopt.py`.
* `leukemia` : Contains the Leukemia data set in JSON format.
* `Leukemia.ipynb` : Contains the Leukemia experiment.
* `minipalindrome` : Contains the MiniPalindrome data set in JSON format.
* `MiniPalindrome.ipynb` : Contains the MiniPalindrome experiment.
* `ptk` : Contains a python3 compatible version of Giovanni Da San Martino's
[python tree kernel (ptk) toolbox][2].
* `ptk_utils.py` : Contains interface functions for the ptk toolbox.
* `README.md` : This file.
* `recursive_net.py` : Implements recursive neural networks
([Sperduti & Starita, 1997][3]) in [pytorch](https://pytorch.org/).
* `recursive_net_test.py` : Provides test functions for `recursive_net.py`.
* `results` : Contains experimental results.
* `Resulty.ipynb` : Evaluates the experimental results.
* `setup.py` : A helper script to compile the `ted.pyx` file using
[cython](https://cython.org/).
* `sorting` : Contains the Sorting data set in JSON format.
* `Sorting.ipynb` : Contains the Sorting experiment.
* `ted.pyx` : Implements the tree edit distance and its backtracing following
[Paaßen (2018)][1].
* `ted_test.py` : Provides test functions for `ted.pyx`.
* `trace.py` : Contains utility classes for tree edit distance backtracing.
* `tree_echo_state.py` : Implements Tree Echo State nwtorks
([Gallicchio & Micheli, 2013][4]).
* `tree_echo_state_test.py` : Provides test functions for `tree_echo_state.py`.
* `tree_edits.py` : Implements tree edits as described in the paper.
* `tree_edits_test.py` : Provides test functions for `tree_edits.py`.
* `tree_utils.py` : Provides utility functions for tree processing.
* `tree_utils_test.py` : Provides test functions for `tree_utils.py`.
## Literature
* Paaßen, B. (2018). Revisiting the tree edit distance and its backtracing: A tutorial. [arXiv:1805.06869][1]
* Sperduti, A. & Starita, A. (1997). Supervised neural networks for the classification of structures. IEEE Transactions on Neural Networks, 8(3), 714-735. doi:[10.1109/72.572108][3]
* Gallicchio, C. & Micheli, A. (2013). Tree Echo State Networks. Neurocomputing, 101, 319-337. doi:[10.1016/j.neucom.2012.08.017][4]
[1]: https://arxiv.org/abs/1805.06869 "Paaßen, B. (2018). Revisiting the tree edit distance and its backtracing: A tutorial. arXiv:1805.06869"
[1]: https://arxiv.org/abs/1805.06869 "Paaßen, B. (2018). Revisiting the tree edit distance and its backtracing: A tutorial. arXiv:1805.06869."
[2]: http://www.joedsm.altervista.org/pythontreekernels.htm "Python tree kernels, as provided by Giovanni da san Martino."
[3]: http://doi.org/10.1109/72.572108 "Sperduti, A. & Starita, A. (1997). Supervised neural networks for the classification of structures. IEEE Transactions on Neural Networks, 8(3), 714-735."
[4]: http://doi.org/10.1016/j.neucom.2012.08.017 "Gallicchio, C. & Micheli, A. (2013). Tree Echo State Networks. Neurocomputing, 101, 319-337."
This diff is collapsed.
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
#!/usr/bin/python3
"""
Tests adversarial edit construction
Copyright (C) 2019
Benjamin Paaßen
AG Machine Learning
Bielefeld University
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
"""
import unittest
import numpy as np
import tree_edits
import tree_utils
import adversarial_edits
__author__ = 'Benjamin Paaßen'
__copyright__ = 'Copyright 2019, Benjamin Paaßen'
__license__ = 'GPLv3'
__version__ = '1.0.0'
__maintainer__ = 'Benjamin Paaßen'
__email__ = 'bpaassen@techfak.uni-bielefeld.de'
class TestAdversarialEdits(unittest.TestCase):
def test_construct_adversarial(self):
# consider the example of counting the number of 'A's in a tree.
# the true label is +1 if the number of 'A's is bigger than the number
# of 'B's and -1 otherwise. However, we have a classifier
# that always returns +1 if the number of 'B's is smaller than 2,
# which works on the training data. In particular, consider the
# following two trees
x_nodes = ['A', 'A', 'B', 'A', 'A']
x_adj = [[1], [2], [3], [4], []]
y_nodes = ['A', 'B', 'B']
y_adj = [[1], [2], []]
y_label = -1
# and the following classifier
def classifier(x_nodes, x_adj):
if(x_nodes.count('B') < 2):
return +1
else:
return -1
# then, the adversarial example should replace one A with a B,
# resulting in a tree like A(B(B(A(A)))), which would have the
# 'true' label +1, but has the predicted label -1.
# Note that our construction has no clue about the 'true' label;
# it just works based on edit scripts between training data points
z_nodes, z_adj, script, label = adversarial_edits.construct_adversarial(
x_nodes, x_adj, +1, y_nodes, y_adj,
classifier)
self.assertEqual(-1, classifier(z_nodes, z_adj))
self.assertEqual(2, z_nodes.count('B'))
self.assertEqual(1, len(script))
self.assertEqual(-1, label)
def test_construct_adversarials(self):
# consider the same example as above
x_nodes = ['A', 'A', 'B', 'A', 'A']
x_adj = [[1], [2], [3], [4], []]
y_nodes = ['A', 'B', 'B']
y_adj = [[1], [2], []]
X = [(x_nodes, x_adj), (y_nodes, y_adj)]
D = np.array([[0, 3], [3, 0]])
Y = [+1, -1]
# with the following classifier
def classifier(x_nodes, x_adj):
if(x_nodes.count('B') < 2):
return +1
else:
return -1
Z, labels, ds = adversarial_edits.construct_adversarials(X, D, Y, Y, classifier)
self.assertEqual(-1, labels[0])
self.assertEqual(0.5, ds[0])
self.assertEqual(+1, labels[1])
self.assertEqual(0.5, ds[1])
def test_construct_random_adversarial(self):
# consider the example of counting the number of 'A's in a tree.
# the true label is +1 if the number of 'A's is bigger than the number
# of 'B's and -1 otherwise. However, we have a classifier
# that always returns +1 if the number of 'B's is smaller than 2,
# which works on the training data. In particular, consider the
# following tree
x_nodes = ['A', 'A', 'B', 'A', 'A']
x_adj = [[1], [2], [3], [4], []]
x_label = 1
# and the following classifier
def classifier(x_nodes, x_adj):
if(x_nodes.count('B') < 2):
return +1
else:
return -1
# then, the adversarial example should add a B at some point
# Note that our construction has no clue about the 'true' label;
# it just works based on edit scripts between training data points
z_nodes, z_adj, script, label = adversarial_edits.construct_random_adversarial(
x_nodes, x_adj, x_label, classifier, ['A', 'B'])
self.assertEqual(-1, label)
self.assertEqual(2, z_nodes.count('B'))
self.assertTrue(len(script) >= 1)
def test_construct_random_adversarials(self):
# perform the same test as above
x_nodes = ['A', 'A', 'B', 'A', 'A']
x_adj = [[1], [2], [3], [4], []]
y_nodes = ['A', 'B', 'B']
y_adj = [[1], [2], []]
X = [(x_nodes, x_adj), (y_nodes, y_adj)]
Y = [+1, -1]
# and the following classifier
def classifier(x_nodes, x_adj):
if(x_nodes.count('B') < 2):
return +1
else:
return -1
Z, labels, ds = adversarial_edits.construct_random_adversarials(X, Y, [+1,+1], classifier, ['A', 'B'])
np.testing.assert_array_equal([-1, 0], labels)
self.assertEqual(2, len(Z))
self.assertTrue(Z[0][0].count('B') >= 2)
self.assertTrue(Z[1] is None)
self.assertEqual(2, len(ds))
def test_binary_search(self):
# construct a trivial start tree
x_nodes = ['A']
x_adj = [[]]
# construct a trivial script where we only add As
n = 10
script = []
for i in range(n-1):
script.append(tree_edits.Insertion(i, 0, 'A'))
# use a classifier which switches the label to 1 if we have
# more than m As
m = n-1
def classifier(nodes, adj):
if(nodes.count('A') > m):
return +1
else:
return -1
# accordingly, our expected script adds precisely m As
expected_script = tree_edits.Script(script[:m])
z_nodes_expected, z_adj_expected = expected_script.apply(x_nodes, x_adj)
self.assertEqual(-1, classifier(x_nodes, x_adj))
self.assertEqual(+1, classifier(z_nodes_expected, z_adj_expected))
# perform the binary search
z_nodes_actual, z_adj_actual, actual_script, label = adversarial_edits._binary_search(x_nodes, x_adj, -1, script, classifier)
self.assertEqual(z_nodes_expected, z_nodes_actual)
self.assertEqual(z_adj_expected, z_adj_actual)
self.assertEqual(expected_script, actual_script)
self.assertEqual(+1, label)
if __name__ == '__main__':
unittest.main()
{
"nodes": [
"AA_",
"GalNAc_1a",
"NeuAc_2a6",
"$"
],
"adj": [
[
1
],
[
2
],
[
3
],
[]
]
}
\ No newline at end of file
{
"nodes": [
"GlcNAc_",
"Fuc_1a3",
"$",
"Gal_1b4",
"NeuAc_2a3",
"$"
],
"adj": [
[
1,
3
],
[
2
],
[],
[
4
],
[
5
],
[]
]
}
\ No newline at end of file
{
"nodes": [
"GalNAc_",