Java编程语言反编译能力分析外文翻译资料

2021-12-21 22:23:51

An Analysis on Java Programming Language Decompiler Capabilities

Konstantins Gusarovs*

Riga Technical University, Riga, Latvia

Abstract – Along with new artifact development, software engineering also includes other tasks. One of these tasks is the reverse engineering of binary artifacts. This task can be performed by using special “decompiler” software. In the present paper, the author performs a comparison of four different Java programming language decompilers that have been chosen based on both personal experience and results of a software developer survey.

Keywords – Decompilation, Java, reverse engineering

I.INTRODUCTION

While software development is usually about producing new artifacts, i.e., turning the code written in some programming language to a binary distribution, sometimes it is necessary to perform reverse operation, which is reverse engineering [1]. Reverse engineering is the process of extracting the knowledge or design blueprints from anything man-made. In relation to software engineering, this can be described as extraction of the source code from the binary (compiled) files. While at first sight such a process might seem conflicting with the copyright, sometimes it is necessary to perform such an operation. An example might be a necessity to fix defects in program or library that was developed by a company some time ago, but any source code for it is missing. Other aspect of reverse engineering in software might be a necessity to obtain some information from the given libraries/programs that have no source code available – for example, cryptographic keys etc. Basically, such cases that are related to reverse engineering of own products are valid and legal use cases. Another example of a valid reverse engineering application is the study of computer viruses [1] by the authors of anti-virus software, which is a necessity for understanding of how malicious software works and how to act against it.

Based on a TIOBE index [2], one of the most popular programming languages used in an enterprise development is Java programming language [3], which is an object-oriented programming language that uses bytecode instructions executed by a stack-based virtual machine. The fact that Java is built around bytecode instead of an assembly language offers several advantages – for example, code written once can be executed on different platforms, given a virtual machine implementation exists for the aforementioned platforms. From the reverse engineering point of view, it means that it is necessary to only process bytecode, which in comparison to the assembly languages, contains fewer instructions. Thus, in order to turn Java bytecode to the source code, it is necessary to be able to transform around 200 different instructions [4] for the latest Java version (10) at the moment, which in comparison, for example, to Intel processor assembly instruction set [5] containing around 2000 different instructions seems an easier task.

Such a task can be performed by software called “decompiler” [1], which translates binary artifact into the source code with a certain amount of precision. Several decompilers exist for the Java programming language, and the goal of the paper is to compare these decompilers in order to provide recommendations for the software developers.

This paper is structured as follows. In Section II, the chosen list of Java programming language decompiler software is given. Section III presents a short introduction to the Java programming language binary file format and bytecode instructions. Section IV gives several examples on Java bytecode decompilation techniques that are used by the decompilation software. In Section V, a test case developed by the author of the paper is described. Section VI shows the results of test case decompilation along with a short analysis of the obtained results. Section VII describes additional test results as well as comparison of decompilers using additional criteria defined by the author of the paper. Finally, in the last section conclusions are made and recommendations about Java decompiler software are given.

II. JAVA DECOMPILER SOFTWARE

Several decompiler programs exist for the Java programming language. In order to choose one to use, it would be necessary to perform the comparison of these programs. In this section, the author provides a list of such software. The list of the decompiler software is built using both the authorrsquo;s personal experience on using such software and results of the survey performed by the author at his current workplace in order to determine what other programmers would recommend using in order to solve such a task:

bull; JD Project [6] is a modular decompiler that can be run as a standalone application or be integrated into development environments, such as Eclipse [7] or Intellij IDEA [8].

bull; CFR [9] is distributed in a form of library that contains a command line interface (CLI) and can also be used as part of other software.

bull; Procyon [10] is a framework that can be integrated into other applications and contains CLI. Several graphical user interface (GUI) implementations exist for it.

bull; Fernflower [11] is a Java decompiler used in the Intellij IDEA [8] development environment. It is distributed in a form of a library that also has CLI interface.

The aforementioned survey on Java decompiler software conducted by the author is based on two questions:

bull; Which Java decompilers are you familiar with?

bull; Which Java decompiler would you recommend to use? There were a total of 247 people that answered these questions. Results of the survey are shown in Tables I and II.

TABLE I

WHICH JAVA DECOMPILERS ARE YOU FAMILIAR WITH

Decompiler

Total answers

Java编程语言反编译能力分析
Konstantins Gusarovs*

Riga Technical University, Riga, Latvia

摘要-随着新的人工制品开发，软件工程也包括其他任务。这些任务之一是二元人工产物的逆向工程。此任务可以通过使用特殊的“反编译程序”软件来执行。在本文中，作者对基于个人经验和软件开发者调查的结果进行选择的四种不同的Java编程语言解编译器进行了比较。

关键词-去编译,Java,逆向工程.

I.引言

虽然软件开发通常涉及到产生新的工件，即将用某种编程语言编写的代码转换为二进制分布，但有时需要执行反向操作，这是逆向工程[1]。逆向工程是从任何人为的东西中提取知识或设计蓝图的过程。在软件工程方面，这可以描述为从二进制(已编译)文件中提取源代码。虽然乍一看，这样的过程似乎与版权相冲突，但有时需要执行这样的操作。一个例子可能是修复一家公司在一段时间前开发的程序或库中的缺陷所必需的，但是它的任何源代码都是缺失的。软件逆向工程的其他方面可能需要从没有源代码的给定库/程序中获取一些信息，例如密码密钥等。基本上，这些与产品逆向工程相关的案例是有效的和合法的用例。另一个有效的逆向工程应用的例子是反病毒软件作者对计算机病毒[1]的研究，这是理解恶意软件如何工作和如何对付它的必要条件。

基于TIOBE索引[2]，企业开发中最流行的编程语言之一是Java编程语言[3]，它是一种面向对象的编程语言，使用堆栈虚拟机执行的字节码指令。Java是围绕字节码而不是汇编语言构建的，这一事实提供了几个优点-例如，如果前面提到的平台存在虚拟机实现，那么只编写一次的代码就可以在不同的平台上执行。从逆向工程的角度来看，这意味着只需要处理字节码，与汇编语言相比，字节码包含的指令较少。因此，为了将Java字节码转换为源代码，目前必须能够为最新的Java版本(10)转换大约200个不同的指令[4]，例如，与包含大约2000条不同指令的Intel处理器组装指令集[5]相比，这似乎是一项比较容易的任务。

这样的任务可以通过称为“反编译器”[1]的软件来执行，该软件以一定的精度将二进制工件翻译成源代码。Java编程语言中存在几种反编译器，本文的目的是对这些反编译器进行比较，为软件开发人员提供建议。

本文的结构如下。第二节给出了Java编程语言反编译软件的选择列表。第三节简要介绍了Java编程语言、二进制文件格式和字节码指令。第四节给出了几个关于Java字节码反编译技术的示例，这些技术是由反编译软件使用的。在第五节中，描述了作者开发的一个测试用例。第六节给出了测试用例反编译的结果，并对得到的结果进行了简要的分析。第七节描述了附加测试结果以及使用作者定义的附加标准对反编译器的比较。最后，对Java反编译软件进行了总结和建议。

II. JAVA 反编译程序软件

Java编程语言有几个反编译程序。为了选择要使用的程序，有必要对这些程序进行比较。在本节中，作者提供了这类软件的列表。反编译软件的列表是根据作者在使用这类软件方面的个人经验和作者在当前工作场所进行的调查结果建立的，以便确定其他程序员建议使用什么来解决此类任务：

·JD项目[6]是一个模块化的反编译器，可以作为独立的应用程序运行，也可以集成到开发环境中，如Eclipse[7]或IntelliJ IDEA[8]。

·CFR[9]是以一种包含命令行接口(CLI)的库的形式分发的，还可以作为其他软件的一部分使用。

·Procyon[10]是一个可集成到其他应用程序并包含CLI的框架。它有几个图形用户界面(GUI)实现。

·Fernflow[11]是IntelliJ IDEA[8]开发环境中使用的Java反编译器。它是以具有CLI接口的库的形式分发的。

作者对Java反编译软件的上述调查是基于两个问题：

·您熟悉哪个Java反编译程序？。

·您推荐使用哪个Java反编译程序？。共有247人回答了这些问题。调查结果见表一和表二。

表一

你熟悉哪个Java反编译程序

Decompiler

Total answers

Fernflower

200

JD Project

158

Procyon

141

CFR

50

JAD

2

表二

你推荐使用哪个Java反编译程序

Decompiler

Total answers

Fernflower

121

JD Project

74

Procyon

44

CFR

8

调查结果表明，大多数人对Fernflower反编译软件很熟悉，并推荐使用它，这可以用以下事实来解释：该软件内置于公司使用的开发环境中，即IntelliJ IDEA[8]。还有其他几种Java编程语言反编译器的实现；但是，在大多数情况下，这些实现是过时的，不受支持的。因此，本文对四种反编译器进行了比较。

III. Java虚拟机二进制文件格式简介

Java虚拟机(JVM)使用二进制.class文件，其中包含源代码编译的结果[12]。这些文件包含有关编译单元(基本上是Java类或接口)的所有必要信息，包括：

bull; 生成给定.class文件的编译器的版本。这允许JVM检测它是否能够加载和执行给定的文件。

bull; 包含各种字符串文本、类和接口名称、字段和方法名称以及在给定编译单元中使用的其他常量的常量池。

bull; 确定给定编译单元的可见性和类型的访问标志。此信息定义了文件中包含的实际类如何被实例化和子类化。

bull; 给定编译单元的基类和接口的信息继承自或实现。

bull; 字段和方法列表以及它们的访问标志和其他修饰符。

bull; 类的属性、字段和方法，用于确定有关编译单元的上述组件的附加信息。其中一个属性是方法的实际代码，其他属性表示可以在运行时中使用的不同信息，例如注释，它是附加到给定成员的语法元数据，或者是在执行所选方法期间可能引发的异常列表。

代码属性包含实际的字节码指令列表，将在适当的方法调用期间使用。如前所述，JVM字节码包含大约。200份指示可分为以下几组：

bull; 数学操作-这些指令用于实际的数学操作表示(例如，dadd指令总结了2个双类型变量)，以及堆栈顶部的常量加载(例如，iconst_0hellip;)。ICONST_5指令允许将整数从0加载到5)。

bull; 堆栈操作-JVM是一种基于堆栈的虚拟机，这意味着它不使用任何类型的寄存器。相反，所有局部变量都被加载到堆栈中，并可以在其上进行处理。这些指令允许写入和读取堆栈顶部包含的信息(例如，ALOAD允许将对象推送到堆栈，而存储在局部变量中)，以及在堆栈顶部创建新对象(New允许创建新对象，而NEWARRAY则创建给定类型的新数组)。此组中的几个指令还用于复制对象(DUP在堆栈顶部创建变量的副本并将其推送到顶部)，或将其从堆栈中移除而不存储到任何局部变量(POP)中。

bull; 类型转换指令-例如，D2I将堆栈顶部的双类型变量转换为int类型，并将结果推到堆栈的顶部。

bull; 类型检查指令，允许检查堆栈顶部的变量类型，并将其替换为检查结果(INSTANCEOF)或抛出运行时异常(CHECKCAST)。

bull; 数值类型比较指令-例如，比较堆栈顶部两个长类型变量的LCMP。

bull; 用于获取对给定资源的互斥访问的同步指令MONITORENTER和MONITOREXIT。

方法调用指令，如发票虚拟,它允许在不同的方式

允许被调用方法将其调用结果返回给其他正在运行的代码的说明 -- 例如,转身允许将对象用作调用结果，而返回表示方法根本没有返回任何结果。

分支指令，用于在代码执行过程中更改程序流。JVM 有两个条件 (例如,IFACMP_EQ如果两个对象通过引用相同，则改变流) 和无条件分支指令 (后藤)。开关语言结构处理也有特殊的说明，例如,汤匙女巫。

调试器指令断点编译后的 Java 代码中不包括这一点。相反，调试器动态地注入这个指令。

ATHROW-抛出异常的指令。

队列长度-允许获取的指令堆栈顶部数组的长度。

NOP-一个空指令

class文件中的每个方法还包含有关正在使用的局部变量的信息。它可能包含或不包含有关变量名称的信息-这取决于编译Java代码的方式。如果在编译过程中省略了局部变量名称，则只有关于局部变量逻辑数字(索引)和类型的信息。

IV. JVM字节码反编译技术

从上一节可以看出，Java字节码的反编译需要从.class文件中提取其成员的信息，如字段和方法，将适当的方法字节码转换为源代码，并为类文件的所有部分添加关于访问标志的附加信息。

本文着重讨论字节码转换的可能性，因为其他反编译任务可以通过提取必要的信息和使用简单的转换技术直接处理。至于字节码，需要理解的是，大多数JVM字节码指令也可以非常简单和直接的方式处理。

在本例中，作者讨论的是字节码指令的所有组，但分支除外。反编译程序还需要在方法调用期间保持JVM堆栈状态的跟踪，因此可以确定哪些对象被加载到堆栈上并从中读取。还需要对局部变量表进行分析，以确定在方法调用过程中使用了哪些局部变量。作者希望提供几个例子，说明这些指令组的反编译技术是如何工作的。

第一个例子如图1所示。在这种情况下，代码本身由三个数学指令组成-堆栈顶部的两个加载整数常量，给出整数的第三个和。给定字节码示例中的最后一条指令告诉JVM使用堆栈顶部的变量作为方法的返回值。

ICONST_1

ICONST_2

IADD

IRETURN

图1.使用数学操作的JVM字节码。

要从给定的字节代码片段恢复源代码，反编译程序必须跟踪字节码执行过程中实际发生的情况。可以看到，由于第一次指令调用，整型常数1被放置在堆栈的顶部，从而导致以下堆栈状态：[1]。第二条指令将整数常量2放在堆栈的顶部，因此堆栈变成：[2，1]。然后，下一条指令移除两个顶级堆栈成员，对它们进行求和，并将这个数学操作的结果放在堆栈的顶部，因此堆栈将被转换为[12]。最后，最后一个字节码指令告诉JVM使用堆栈顶部的变量作为方法返回值。通过遵循这些信息，可以看到给定的字节码片段对应于图2所示的Java源代码。

return 1 2;

图2.FirstByteMode碎片整理结果。

图3示出了类似的字节码片段，唯一的例外是使用本地变量，而不是整数常量。

ILOAD 1

ILOAD 2

IADD

IRETURN

图3.使用数学操作和局部变量编写JVM字节码。

这个字节码片段的反编译逻辑与前面的示例相同：反编译程序应该跟踪堆栈的状态，因为每次执行指令，并使用这些信息重建源代码。当字节码指令中使用局部变量时，反编译器应该引用适当方法的局部变量表来确定实际的变量名。

图4显示了这个字节码反编译结果的两个可能结果-第一个假设存在局部变量名，而第二个假设在编译过程中删除了这个信息，反编译器必须根据局部变量表中的索引生成变量名。

return a b;

return var1 var2;

图4.二是字节码片段反编译结果。

可以看出，在不考虑分支指令的情况下，反编译器的任务将是重建在适当的字节码指令调用过程中所发生的事情，并由于其工作而发出适当的语法结构。如果分支指令也出现在字节码中，则有必要分析控制流被分支指令重定向到何处。图5提供了具有分支指令的字节码的第一个示例。

ILOAD 1

ILOAD 2

IF_ICMPLT L1

ICONST_1

IRETURN

L1:

ICONST_2

IRETURN

图5.第一个分支例子。

这

资料编号：[4056]
</p

您需要先支付 30元 才能查看全部内容！立即支付

注册

找回密码