A deep discussion is happening here:
A comparative study:
Reinforcement Learning:
Pointer Network:
https://www.youtube.com/watch?v=2iBR8v2i0pM (Direct sampling and annealing)
Theory of TSP (cost minimization of hamiltonian cycle):
Simulated annealing:
Travelling from one point to another, can be treated as moves made from one chess position to another. If he can travel with the minimal resources, it is equivalent to maximizing his chances of winning or minimizing chances of losing.
Ie, Cost is the gaining opponent position or pieces, or minimizing the distances travelling from one point to another.
Therefore, a good algorithm to solve the problem should have:
1. Statistical sampling and branching to try alternatives path, and terminating path upon the cost function not meeting the continuation criteria.
2. An accumulative sum formula to total up the costs (or distance) for each subsequent choices taken.
3. Approximation, or time-based cut-off truncation - which will always terminate the search after a certain time has reached, or certain amount of resources consumed. The more time given, perhaps better solutions can be attained.
4. For each set of points and distance configuration - there will be a learning involved to reach the optimized solution. Generally this learning can be self-taught, and generative by nature (reinforcement-learning style), just like Q learning for Alpha Go.
5. Concurrent parallelized processing (scalability): since subgraph and pairing of points (perfect match) can be computed independently - these should be implemented as independent/parallel-computed recursive neural network. So in general, the more computing resources, more parallelized computation are possible. It is not necessary that all the computation are non-overlapping in edge and path and points.
6. Cost function: what kind of cost functions can be used for minimization?
7. TSP is more like discretization problems solving than continuous problem solving (eg, image convolution etc).
8. TSP is more like logical thinking - proceeding one step at a time, given the limited inputs (which is the current state), to decide the optimal next state. Thus Reinforcement learning is more suited to do TSP, just like that of AlphaGo or Chess Playing, which have limited present states configuration, but many possible future states.
9. Contrast these with that of image recognition (inputs are pixel), or translation (inputs are characters or words, or sentences) - these are a lot more numerous/denser as compared with single state input. Thus DeepLearning is more suited for these denser inputs, and RL is for TSP.
10. Explanability/Interpretability: Decision associated between a sequence of steps and its qualitative explanation is easy - if the sequence is short. Thus a short sequence of explanable actions will likely result in a new sequence of explanable actions. But the longer it chained, the more difficult to establish a logical and explanable actions taken.
8. TSP is more like logical thinking - proceeding one step at a time, given the limited inputs (which is the current state), to decide the optimal next state. Thus Reinforcement learning is more suited to do TSP, just like that of AlphaGo or Chess Playing, which have limited present states configuration, but many possible future states.
9. Contrast these with that of image recognition (inputs are pixel), or translation (inputs are characters or words, or sentences) - these are a lot more numerous/denser as compared with single state input. Thus DeepLearning is more suited for these denser inputs, and RL is for TSP.
10. Explanability/Interpretability: Decision associated between a sequence of steps and its qualitative explanation is easy - if the sequence is short. Thus a short sequence of explanable actions will likely result in a new sequence of explanable actions. But the longer it chained, the more difficult to establish a logical and explanable actions taken.
11. Uncertainty quantity: Different path should be given different qualitative "uncertain" attributes, like chances of getting accident or termination.
12. Backtracking capability, or discarding (non-optimal or not) solutions: Sometimes the forward reasoning may need a complete discard of present achievement and restart from a few steps earlier. This can happen through probabilistic truncation, technically similar to "dropout", or ignoring the irrelevant details, or "forgetting" to avoid keeping too much information. This is equivalent to "abstraction" to "summarize" only the essential details.
13. Tagged qualitative reasoning: each decision made can have have a range of possible qualitative attributes associated with it, which may accumulatively affect the final decision taken. For eg, safetiness issues, privacy issues, reliability, complexity, past credibility or successes etc.
No comments:
Post a Comment