IT PARK
    Most Popular

    Uncover 10 big data myths

    Sep 02, 2023

    Blockchain insulation, the universe is open

    Sep 18, 2023

    Building a Smart City: The Importance of Cloud Storage

    Sep 19, 2023

    IT PARK IT PARK

    • Home
    • Encyclopedia

      What is brute force cracking?

      Oct 01, 2023

      What is the reason for the computer card? How to deal with the computer card?

      Sep 30, 2023

      Which is better, laptop, desktop or all-in-one

      Sep 29, 2023

      icloud space is always insufficient to do

      Sep 28, 2023

      What is the difference between the Guid format and MBR format for computer hard drive partitioning?

      Sep 27, 2023
    • AI

      What are the young people interacting with Japan's "Buddhist AI" seeking and escaping from?

      Oct 01, 2023

      Nvidia Announces GH200 Superchip, Most Powerful AI Chip, to Accelerate Generative AI Workloads

      Sep 30, 2023

      Google has categorized 6 real-world AI attacks to prepare for immediately

      Sep 29, 2023

      Samsung considers replacing Google search with Bing AI on devices

      Sep 28, 2023

      Generative AI designs unnatural proteins

      Sep 27, 2023
    • Big Data

      What are the misconceptions in data governance in the digital age?

      Oct 01, 2023

      What is a data warehouse? Why a Data Warehouse?

      Sep 30, 2023

      What is Data Governance? Why do organizations need to do data governance?

      Sep 29, 2023

      Winning Business Excellence with Data Analytics

      Sep 28, 2023

      Has the development of big data come to an end?

      Sep 27, 2023
    • CLO

      How to Reduce the Risk of Cloud Native Applications?

      Oct 01, 2023

      How should the edge and the cloud work together?

      Sep 30, 2023

      Last-generation firewalls won't meet cloud demands

      Sep 29, 2023

      Healthcare Explores Cloud Computing Market: Security Concerns Raise, Multi-Party Collaboration Urgently Needed

      Sep 28, 2023

      Remote work and cloud computing create a variety of endpoint security issues

      Sep 27, 2023
    • IoT

      Berlin showcases smart city innovations

      Oct 01, 2023

      IoT solutions lay the foundation for more effective data-driven policing

      Sep 30, 2023

      CO2 reductions won't happen without digital technology

      Sep 29, 2023

      4 Effective Ways the Internet of Things Can Help with Disaster Management

      Sep 28, 2023

      6 Ways the Internet of Things Can Improve the Lives of Animals

      Sep 27, 2023
    • Blockchain

      Which is better for the logistics industry and blockchain

      Oct 01, 2023

      Will blockchain revolutionize the gaming industry?

      Sep 30, 2023

      How do you make a blockchain investment?

      Sep 29, 2023

      What is the connection between blockchain and Web 3.0?

      Sep 28, 2023

      Canon Launches Ethernet Photo NFT Marketplace Cadabra

      Sep 27, 2023
    IT PARK
    Home » AI » When AI starts to have "subconsciousness"
    AI

    When AI starts to have "subconsciousness"

    The integration of deep learning with traditional industries in application has made AI an unprecedented explosion. But as Li Feifei, a professor at Stanford University, said, there is still a long way to go no matter in terms of intelligence, manpower or machine equipment.
    Updated: Sep 11, 2023
    When AI starts to have "subconsciousness"

    The integration of deep learning with traditional industries in application has made AI an unprecedented explosion. But as Li Feifei, a professor at Stanford University, said, there is still a long way to go no matter in terms of intelligence, manpower or machine equipment.

    There is no end to learning, but for a long time, there has been almost no significant progress in the algorithm field, which has also led to some congenital deficiencies in the model's landing deployment, and AI has never stopped being questioned. For example, the privacy problem caused by the proliferation of artificial intelligence requires technology enterprises to self constrain, and it is obviously necessary to optimize and improve the algorithm.

    How will AI affect people's privacy? An article may not answer this complex question, but we hope to start throwing it out now.

    When neural networks have memory

    Before discussing privacy issues, let's talk about the clich é LSTM model.

    We have already introduced its function a lot. To put it simply, the concept of memory is added to the neural network so that the model can remember the information in a long time series and make predictions. AI's magic ability to write more fluent articles, to have smooth and natural conversations with humans, and so on, is based on this ability.

    Later, for a long time, scientists made a series of supplements and extensions to the memory of neural networks. For example, attention mechanism is introduced to enable LSTM network to track information for a long time and accurately. Another example is using external memory to enhance the time series generation model and improve the performance of convolutional networks.

    In general, the improvement of memory ability, on the one hand, endows the neural network with the ability to perform complex reasoning on relationships, which makes its intelligence significantly improved; On the application side, the experience of intelligent systems such as writing, translation and customer service systems has also been greatly upgraded. To some extent, memory is the beginning of AI tearing off the impression label of "artificial intellectual disability".

    However, having memory also represents two problems: one is that neural networks must learn to forget, so as to free up storage space and retain only those important information. For example, at the end of a chapter in a novel, the model should reset the relevant information and only retain the corresponding results.

    In addition, the "subconscious" of neural networks also needs to be vigilant. In short, after training on sensitive user data, will the machine learning model automatically bring out those sensitive information when it is released to the public? In this digital age where everyone can be collected, does this mean that privacy risks are increasing?

    Does AI really secretly remember privacy?

    For this question, researchers at Berkeley University have conducted a series of experiments, and the answer may shock many people, that is, your data and AI may be kept in mind.

    If you want to understand the "unintentional memory" of neural networks, you should first introduce a concept, that is, over fitting.

    In the field of deep learning, the model performs well on training data, but fails to achieve the same accuracy or error rate on data sets other than training data, which is called over fitting. The main reason for this difference from the laboratory to the real sample is that there is noise in the training data, or the amount of data is too small.

    As a common side effect of deep neural network training, over fitting is a global phenomenon, that is, the state of the entire data set. To test whether the neural network will secretly "remember" the sensitive information in the training data, it is necessary to observe local details, such as whether a model has a special complex with an example (such as credit card number, account password, etc.).

    In order to explore the "unintentional memory" of the model, Berkeley researchers conducted three stages of exploration:

    First, prevent the model from over fitting. By gradient descent of the training data and minimizing the loss of the neural network, the accuracy of the final model on the training data is guaranteed to be close to 100%.

    Then, give the machine a task to understand the underlying structure of the language. This is usually achieved by training the classifier on a series of words or characters to predict the next tag, which will appear after seeing the previous context tag.

    Finally, the researchers conducted a controlled experiment. In the given standard pen treebank (ptb) dataset, a random number "281265017" is inserted as a security mark. Then a small language model is trained on the expanded dataset: Given the previous character of the context, predict the next character.

    Theoretically, the volume of the model is much smaller than the data set, so it is impossible to remember all the training data. So, can it remember that string of characters?

    The answer is YES.

    When the researchers input a prefix "random number is 2812" to the model, the model will happily and correctly predict the whole remaining suffix: "65,017".

    What's more surprising is that when the current prefix is changed to "random number is", the model will not immediately output the string of characters "281265017". The researchers calculated the possibility of all nine digit suffixes, and the results showed that the inserted string of security mark characters was more likely to be selected by the model than other suffixes.

    So far, we can cautiously draw a rough conclusion that the deep neural network model does unconsciously remember the sensitive data fed to it during the training process.

     

    When AI has subconsciousness, should humans panic?

    As we know, today AI has become a cross scene and cross industry social movement. From the recommendation system, medical diagnosis, to cameras in densely distributed cities, more and more user data has been collected to feed the algorithm model, which may contain sensitive information.

    Previously, developers often anonymized sensitive columns of data. However, this does not mean that the sensitive information in the dataset is absolutely safe, because an attacker with ulterior motives can still reverse the original data by looking up tables and other methods.

    Since it is inevitable to involve sensitive data in the model, measuring the memory of a model for its training data is also a proper meaning to evaluate the security of future algorithm models.

    Here we need to solve three doubts:

    1. Is the "unintentional memory" of neural network more dangerous than the traditional over fitting?

    Berkeley's research concluded that although "unintentional memory" had been trained for the first time, the model had already begun to remember the inserted safe characters. However, the test data shows that the peak value of the data exposure rate in the "unintentional memory" often reaches the peak value and starts to decline before the model starts to over fit with the increase of the test loss.

    Therefore, we can draw the conclusion that although "unintentional memory" has certain risks, it is not more dangerous than over fitting.

    1. What scenarios might the specific risks of "unintentional memory" occur in?

    Of course, the absence of "more dangerous" does not mean that unintentional memory is not dangerous. In fact, researchers found in the experiment that with this improved search algorithm, only tens of thousands of queries can be used to extract 16 digit credit card numbers and 8 digit passwords. The details of the attack have been made public.

    That is, if someone inserts some sensitive information into the training data and releases it to the world, the probability of its exposure is actually high, even though it does not appear to have been fitted. Moreover, this situation cannot cause immediate concern, which undoubtedly greatly increases the security risk.

    1. What are the prerequisites for the disclosure of private data?

    At present, it seems that the "safe characters" inserted into the dataset by researchers are more likely to be exposed than other random data, and show a normal distribution trend. This means that the data in the model does not share the same probability of exposure risk, and the data deliberately inserted is more dangerous.

    In addition, it is not easy to extract the sequence in the "unintentional memory" of the model, which requires pure "brute force", that is, infinite computing power. For example, the storage space of all nine digit social security numbers only takes a few GPU hours to complete, while the data size of all 16 digit credit card numbers takes thousands of GPU years to enumerate.

    At present, as long as the quantification of this "unintentional memory" is available, the security of sensitive training data will be controlled within a certain range. That is to know how much training data a model has stored and how much has been over memorized, so as to train a model leading to the optimal solution to help people judge the sensitivity of data and the possibility of model leaking data.

    In the past, we mentioned AI industrialization, mostly focusing on some macro level, how to eliminate algorithm bias, how to avoid the black box nature of complex neural networks, and how to "grounded" to achieve the implementation of technical dividends. Now, with the gradual completion of basic transformation and concept popularization, AI will move towards refinement and micro level iterative upgrading, which may be the future that the industry is looking forward to.

    artificial intelligence the subconscious deep learning
    Previous Article Siemens launches Connect Box, a smart IoT solution for managing small buildings
    Next Article IoT solutions lay the foundation for more effective data-driven policing

    Related Articles

    AI

    Who owns the copyright of the paintings created by AI for you?

    Sep 09, 2023
    AI

    Will the latest AI "kill" programming

    Sep 14, 2023
    AI

    What is the neural network of artificial intelligence?

    Sep 19, 2023
    Most Popular

    Uncover 10 big data myths

    Sep 02, 2023

    Blockchain insulation, the universe is open

    Sep 18, 2023

    Building a Smart City: The Importance of Cloud Storage

    Sep 19, 2023
    Copyright © 2023 itheroe.com. All rights reserved. User Agreement | Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.