Article Preview
Top1. Introduction
Decision making is the process of making decision based on imperfect information about environment and opponents. The importance of decision making in many fields makes it receive much attention from scientist. Making decisions with imperfect information under uncertainty in inconsistent and dynamic environments is particularly true for stock institutions in the stock market. What makes things even more complicated is the situation in which the participants in the game (competition) need to make their strategic business decisions in a conflicting or cooperative way (A.R. Heidari, 2010; Sun et al,2022a). A cooperative competition strategy evaluates the strategy effects not only for itself, but also for its opponent. On the other hand, a conflicting strategy maximizes the reward only for itself (J.-Y. Kim, J.Y. Kwon, 2017). As the simulation results in this study demonstrate, for financial institutions, in some scenarios, they should adopt the conflicting competition strategy, and in others, they should adopt the cooperative competition strategy. When decision making in a real environment becomes complex, the optimum equilibrium for games would not be easily achieved without the aid of intelligent computational algorithms.
In order to be able to deal with inconsistent information and dynamic decision-making environment in an isolated environment without interference from other conditions, this study chose a small-scale listed company. According to the annual report of this listed company, in a long period of time There are only two institutions trading the company's shares. In this study, a dynamic and imperfect information game model and algorithm are built to address the inconsistency of information and the dynamic nature of the decision-making process toward maximizing rewards under two scenarios, conflicting (algorithm 1) and cooperative (algorithm 2). Next, in order to model the risk transition probabilities, which would be used in the Markov Decision Process of Reinforced Learning, Probabilistic Fuzzy Regression (PFR), Chaos Optimization Algorithm (COA) and t-Copula simulation of a Non-Stationary Markov Chain model (algorithm 3) would be implemented to study the external risk factors’ effect on the transition probabilities. Finally, the Deep Q-Network (DQN) of Reinforced Learning would be used to estimate the optimum actions (strategies) derived from the progressive game decision process, under the two assumptions, conflicting or cooperative. Each institution has three opportunities to adjust its stocks orders placement strategies (Algorithm 3) or each has unlimited opportunities (Algorithm 4). The former focuses on the analysis of the impact of competition type and reaction speed, while the latter focuses on the comparison of compensation under different competition setups. In addition, in order to study the long-term dynamic decision-making of participants in inconsistent stochastic complex environment, the optimal order placement strategies and the optimal competition strategies would be estimated under the infinite adjustment scenarios. Therefore, the method proposed in this paper is of great significance not only to interdisciplinary research, but also to the practitioners as well.
The approach proposed in this study would solve several obstacles common in dynamical decision-making with inconsistent information under a stochastic decision outcome (T. Zu, M. Wen, R. Kang, 2017; Sun et al., 2021)). First, in real business competition, the information is seldom complete. In most circumstances, the quality of the information is insufficient at best. Participants in the game often release false information (hence inconsistent) about their strategies on purpose in order to strategically mislead their opponents (Sun et al., 2022b; Sun et al.,2022c). As a result, the imperfect dynamical information game model with different competition scenarios (conflicting or cooperative) is proposed to isolate the untrustworthy information and abstract the true behavior pattern from the actual events. Since this is the audited key risk indicator, as well as the well-known performance measurement, even the lagged figure of reward could be quite useful in collecting information. On the other hand, by comparing the simulation results under conflicting scenarios with those under cooperative scenarios, the effect of misleading information could be kept at a minimum. Detailed descriptions and reasoning of the game model and competition scenarios are found in section 2.