1# Repo Diff Trees 2 3repo_diff_trees.py compares two repo source trees and outputs reports on the 4findings. 5 6The ouput is in CSV and is easily consumable in a spreadsheet. 7 8In addition to importing to a spreadsheet, you can also create your own 9Data Studio dashboard like [this one](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU). 10 11If you wish to create your own dashboard follow the instructions below: 12 131. Sync the two repo workspaces you wish to compare. Example: 14 15``` 16mkdir android-8.0.0_r1 17cd android-8.0.0_r1 18repo init \ 19 --manifest-url=https://android.googlesource.com/platform/manifest \ 20 --manifest-branch=android-8.0.0_r1 21# Adjust the number of parallel jobs to your needs 22repo sync --current-branch --no-clone-bundle --no-tags --jobs=8 23cd .. 24mkdir android-8.0.0_r11 25cd android-8.0.0_r11 26repo init \ 27 --manifest-url=https://android.googlesource.com/platform/manifest \ 28 --manifest-branch=android-8.0.0_r11 29# Adjust the number of parallel jobs to your needs 30repo sync --current-branch --no-clone-bundle --no-tags --jobs=8 31cd .. 32``` 33 342. Run repo_diff_trees.py. Example: 35 36``` 37python repo_diff_trees.py --exclusions_file=android_exclusions.txt \ 38 android-8.0.0_r1 android-8.0.0_r11 39``` 40 413. Create a [new Google spreadsheet](https://docs.google.com/spreadsheets/create). 424. Import projects.csv to a new sheet. 435. Create a [new data source in Data Studio](https://datastudio.google.com/datasources/create). 446. Connect your new data source to the project.csv sheet in the Google spreadsheet. 457. Add a "Count Diff Status" field by selecting the menu next to the "Diff 46 Status" field and selecting "Count". 478. Copy the [Data Studio dashboard sample](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU). 48 Make sure you are logged into your Google account and you have agreed to Data Studio's terms of service. Once 49 this is done you should get a link to "Make a copy of this report". 509. Select your own data source for your copy of the dashboard when prompted. 5110. You may see a "Configuration Incomplete" message under 52 the "Modified Projects" pie chart. To address this select the pie chart, 53 then replace the "Invalid Metric" field for "Count Diff Status". 54 55## Analysis method 56 57repo_diff_trees.py goes through several stages when comparing two repo 58source trees: 59 601. Match projects in source tree A with projects in source tree B. 612. Diff projects that have a match. 623. Find commits in source tree B that are not in source tree A. 63 64The first two steps are self explanatory. The method 65of finding commits only in B is explaned below. 66 67## Finding commits not upstream 68 69After matching up projects in both source tree 70and diffing, the last stage is to iterate 71through each project matching pair and find 72the commits that exist in the downstream project (B) but not the 73upstream project (A). 74 75'git cherry' is a useful tool that finds changes 76which exist in one branch but not another. It does so by 77not only by finding which commits that were merged 78to both branches, but also by matching cherry picked 79commits. 80 81However, there are many instances where a change in one branch 82can have an equivalent in another branch without being a merge 83or a cherry pick. Some examples are: 84 85* Commits that were squashed with other commits 86* Commits that were reauthored 87 88Cherry pick will not recognize these commits as having an equivalent 89yet they clearly do. 90 91This is addressed in two steps: 92 931. First listing the "git cherry" commits that will give us the 94 list of changes for which "git cherry" could not find an equivalent. 952. Then we "git blame" the entire project's source tree and compile 96 a list of changes that actually have lines of code in the tree. 973. Finally we find the intersection: 'git cherry' changes 98 that have lines of code in the final source tree. 99 100 101## Caveats 102 103The method described above has proven effective on Android 104source trees. It does have shortcomings. 105 106* It does not find commits that only delete lines of code. 107* It does take into accounts merge conflict resolutions. 108