Supervised by: Rita Kimijima-Dennemeyer, BA (Hons). Rita recently graduated from the University of Oxford having read Psychology, Philosophy, and Linguistics. She has a particularly interest in clinical psychology, mental health policy, and the ethics of mental health treatment, and she intends to pursue a masters degree in this field.

Abstract

This paper looks to explore the ways in which different theories of child language acquisition can be compared and applied to how GPT-3 processes and uses language. The paper delves into the behaviourist and cognitivist theories of child language acquisition, then focuses on two aspects of GPT-3 – neural networks and the errors it makes in language. A comparison between child language acquisition and GPT language learning is then made to highlight the differences in their abilities. This paper comes to the conclusion that GPT language learning and child language acquisition are fundamentally different since GPT is unable to experience the real world outside of the text that it is fed whereas children learn through connecting with the environment. GPT cannot act as a model of how humans learn because it does not have the innate sense of language that humans do. Rather, it should imitate the structure of human language in order to make communication between GPT and humans more efficient and effective. We suggest that it may be impossible to improve on GPT directly, and instead, should use its existing strengths in tandem with human language learning.

Introduction

ChatGPT has become a prevalent program in today’s world and is being used in a number of different fields. Within five days of its release, it managed to amass one million users. This growth is at an unprecedented rate when compared with other websites such as Facebook, Netflix, Instagram, and Twitter which needed 300, 1200, 75 and 720 days, respectively, to reach this number (Biswas, 2023).

GPT is capable of reaching levels of knowledge that humans cannot by holding much more data than the human mind is able to. This allows us to realise the reach ChatGPT has around the world and allows us to emphasise the significance of ChatGPT’s capabilities but more importantly, the fields in which it falls short.

Due to this global usage of ChatGPT, it is essential that we understand how the AI learns and uses language since language is how ChatGPT communicates when it is used in real life. This means, fundamentally, we must focus on how ChatGPT acquires language similarly, or differently to humans.

To identify how ChatGPT acquires language, this paper will use research on child language acquisition as a point of comparison to demonstrate the differences between the learning of language within ChatGPT and humans.

Theories of child language acquisition

Behaviourism

In this paper, and in most linguistic research, behaviourism is defined as the theory that highlights observable behaviours in language shaped by the environment. It focuses on the relationship between stimuli and responses, with learning being through a process called conditioning. Two examples of behaviourist conditioning are operant and classical conditioning. Operant conditioning, theorised by B.F. Skinner, uses rewards and punishment to reach the desired behaviour, of which language is one example. Some specific examples of this conditioning include shaping and aversion Therapy. Shaping teaches new behaviour by rewarding actions that get closer to the target behaviour while aversion therapy associates an unwanted action with an unpleasant feeling until the target behaviour — the quitting of the unwanted action — is reached. Reinforcement learning is a term that encompasses both of these phenomena and it includes both punishment and reward. It is generally defined as ‘the modification of behaviour based on past experience of the positive and/or negative consequences of particular predictive events (stimuli or actions)’ (Markou et al., 2013). For instance, in reinforcement learning, when a child says its first word like “mama”, the mom will likely reward the baby, for example by clapping her hands, talking in a high voice, and repeating the word. In contrast, if a baby gets the name of an object wrong the parent will not praise them and correct them instead.

In contrast to operant conditioning, classical conditioning, theorised by Ivan Pavlov, uses repeated exposure to a stimulus pairing in order to provoke an involuntary response. For example, Pavlov trained his dogs to salivate at the sound of a tuning fork.

Other examples of behaviourism include contextual influence and interaction learning. Contextual influence means to analyse the context of a situation, adapting language use accordingly — employing formal language in professional settings and casual language with friends, for instance. In contrast, interaction learning is refining language skills through social engagement, wherein individuals utilise language in conversations to adapt and improve based on feedback. Together, these concepts give psychologists a fundamental framework for comprehending language acquisition and development.

Cognitivism

In contrast to the behaviourist theory on language acquisition which focuses on the individual’s environment, cognitivism seeks to understand it through children’s cognitive states and development. Specifically, we will be focusing on Noam Chomsky who came up with a form of the innateness theory which has become widely discussed. It essentially states that children are born with a specialised language-acquisition mechanism. He believes that humans have evolved in a way that resulted in us having these innate structures in our minds. This section goes on to explore his theory in more depth.

Chomsky’s arguments against Skinner

Skinner (1957) states that verbal behaviour comes about as a response to stimuli in the speaker’s surroundings. However, according to Chomsky (1959), verbal behaviour comes about from within the human mind, since it is impossible to know what the speaker’s stimuli are based on external factors alone. Chomsky also notes that we are able to talk about something that is not there — for example, we might say the name of someone who is not in the room with us, or mention a city that we are not in. There are no actual stimuli present in our environment to stimulate our language response in such cases, therefore proving Chomsky’s point that linguistic knowledge cannot rely on external stimuli alone.

Chomsky strongly disagrees with several elements of Skinner’s behaviourist view of language acquisition. A focal point of Skinner’s argument is the element of anti-mentalism. This is a key feature of behaviourism. Anti-mentalism is the claim that one should avoid the mention of internal mental states in the psychological explanation of humans (Samet & Zaitchik, 2012). Considering Chomsky’s innateness theory, it makes sense as to why he would dispute this feature.

Chomsky goes on to state that behaviourism itself indirectly includes innate qualities. For instance, in Skinner’s experiments, he overlooks the fact that animals have a variety of behaviours that they emit in an instinctive manner — that is, these behaviours are innate. They were always present in animals’ minds and are simply being brought out because of changes in their physical environment. Skinner’s ideas of “respondents” (reflex responses emitted by specific stimuli), “operants” (responses emitted without any clear stimuli) and “law of conditioning” (states that the strength of response is increased when the occurrence of an operant is followed by a reinforcing stimulus) all rely on the assumption that animals emit behaviours that are intrinsic to them (Chomsky, 1959).

Components of Chomsky’s innateness theory

Chomsky states that all humans have a Universal Grammar (UG) from birth and uses this idea as an important point throughout his theory. UG is made up of non-specific principles that are relevant to grammars of all languages. It does not consist of specific rules, and Chomsky believes that this shows how languages are the same in their internal structure. Our minds automatically know which grammatical constructions are possible. It may be argued that adults know what is grammatical because of their experience in commonly hearing the grammatically correct sentences in their life — meaning they are able to filter out what is wrong and what is right. However, children have not experienced enough of the world to be able to do so, yet they still largely produce grammatically correct sentences. The only explanation left for this is that they must know because of something innately encoded in their minds.

This relates to an important element of Chomsky’s innateness theory; the poverty of the stimulus argument, which is what he used to support his theory of Universal Grammar. Chomsky knew that if language was purely learned, then the only way in which children would settle on the correct grammar is through explicit feedback from experience of regularly using ungrammatical sentences. Nevertheless, it is shown that children do not face enough negative feedback to be able to identify ungrammatical sentences, and they also do not make enough errors in their speech (Cook, 1985). Essentially, we cannot correlate our experience to our scope of knowledge. This evidently proves that it is impossible for children to only “learn” language from their environment, therefore reinforcing Chomsky’s argument that it is an innate process.

Relating language acquisition to cognitive development

According to Chomsky, development in the language acquisition process came about because of maturation of the Language Acquisition Device (LAD) — the part of our minds that contains Universal Grammar. The LAD matures because the language we are exposed to in our upbringing triggers a process that alters our internal grammar (Kuhl, 2000).

This links with cognitive development because there are specific language concepts that the child cannot understand in the beginning stages of language acquisition. Certain concepts are always present, and rather stay inactive until a child reaches a certain phase of maturation (Crain & Dean Fodor, 1989). Essentially, these concepts can only be comprehended once the child has developed the appropriate mental ability.

Language can also be said to be interdependent with other developmental abilities. For example, a child might only start off with short sentences and then gradually longer sentences when they are older, because their nervous system is now capable of transmitting more complex signals.

Conclusion

These theories of language acquisition that have been discussed focus on the individual and the situation. The key difference between them is which side they concentrate on. Skinner and his theory of behaviourism concentrates on the environment of the individual, while Chomsky and his cognitivist theory concentrates on the learner himself. Perhaps the best way to understand how children learn language is to consider both perspectives and how they work together. While Chomsky denies the behaviourist’s argument that one’s environment is the sole factor in developing their linguistic environment, he does recognise its importance in the advancement of UG when applying it to their native language (Cook, 1985). At the same time, whether intentionally or not, it is clear that behaviourism relies on several aspects of innateness.

AI (Chat GPT-3) as a large language model (LLM)

Neural networks

Artificial neural networks (or neural nets) are learning methods in artificial intelligence (AI) that teach computers how to process data and ultimately predict the outcome correctly. Neural nets are named this way because they are inspired by the structure of the neural networks in the human brain, with interconnected nodes and information going through them. When applied to AI, they take the form of complex mathematical models that train themselves to predict the desired output based on an input by analysing patterns in quantitative data — in the case of GPT-3, the output is the subsequent word.

There are various architectures used to construct separate neural networks, each tailored to a specific task. Recurrent neural networks (RNNs) excel in processing sequential data, making them the preferred tool for tasks in the realm of language modelling and translation. Long short-term memory (LSTM) networks, a type of RNN, are particularly adept at preserving context over longer sequences. Moreover, convolutional neural networks (CNNs) are renowned for their unique prowess in analysing images. The variant used for GPT-3, however, is the large language module (LLM), denoting that GPT-3 is a third-generation autoregressive language model — it predicts the next word in a text based on the antecedent literature, in order to produce human-like text. It functions by collecting data (words and phrases) from the internet and then inferring chains of words via an artificial neural network (Floridi & Chiriatti, 2020).

Inside neural nets, there are numerous neurons, displayed in layers: the input layer, the hidden layers, and the output layer. Inside the input layer, each unit of data given is assigned to a different neuron. These neurons are set up in a column, with the artificial neurons connected to each neuron in the following column. The connections between them are called channels. Each channel is assigned a numerical value, known as its weight. These digits represent the strength of connections between the units. Weights may be either positive or negative. A negative weight represents the inhibition of the receiving unit by the activity of a sending unit (Buckner & Garson, 2019).

In the first layer, the neurons’ inputs are multiplied by the corresponding weights, and the sum of each multiplication is now the next neuron’s input. Each of these neurons in the second layer is associated with a numerical value called the bias which is then added to the input sum. For instance:

(x1 (Neuron) * 0.8 (Weight) + x2 + 0.3) + Bias1

This new value is passed through a threshold procedure called the activation function. This process decides whether the neuron’s input to the network is influential or not, by means of predicting the ultimate output using simpler mathematical operations. In simpler terms, the activation function determines if a neuron will be activated and send data to the next layer; if the input fails the activation function, the neuron will not get activated and the pathways proceeding it will cease.

This segregation of information is called the activation function and is used to filter relevant information. Each hidden layer typically follows the same activation function. For instance, they usually add the contribution of each sending neuron (the weight) multiplied by the sending unit’s activation value. This particular formula is usually adjusted even further, for example, by modifying the activation sum to a number between 0 and 1 or by setting the activation to zero unless a threshold level for the sum is reached. This process is called forward propagation and continues to run until an output is concluded.

If the correct output is not equivalent to the predicted output the machine will reverse the process to minimise the difference between the actual and desired output. This is called backpropagation. Based on this new information, the network individually adjusts the weights of the channels to formulate the most effective way to reach the desired conclusion. This combination of forward and backpropagation is how the machine learns how to predict the output later when it does not have the answer stored.

Many scholars believe that cognitive functioning in the human brain can be justified by the collection of units that work this way. As it is presumed that all the neurons calculate approximately the same activation function, human language acquisition must ultimately depend on the settings of the weights between the neurons (Buckner & Garson, 2019).

Errors in GPT-3

The ways in which GPT learns language (through correlations between words), as illustrated above, are vastly different to how a human learns language (through responding to the external environment). This difference can therefore lead to GPT’s proneness to errors. Without diminishing GPT’s strengths, it is essential that we look at its pitfalls to help us identify the problems with GPT’s capabilities, and perhaps, lead to improvements.

Liu (2021) has suggested that GPT-3 is able to learn through “in-context learning”, wherein it is able to learn with little amounts of data, but is able to use the context of which the learning is done to hone its performance. It is suggested that if GPT-3 is able to come into more in-context examples, GPT-3 is able to learn better.

Since GPT uses patterns to determine what words to use next, it can sometimes create content that drifts away from the original text they are based on and may also start generating content that is unrelated to the original text. It is suggested that this is due to the way GPT is trained and leads to it not having a true understanding of language. This training method for GPT is a result of its architecture since unlike humans, it does not have innate knowledge of language, leading to the inability to acquire language fully.

A study by Binz and Schulz (2023) furthered the understanding of GPT’s errors. In this study, multiple experiments were performed where the researchers took scenarios from cognitive psychology and used them as prompts to determine if GPT’s answers were human-like or not — that is, whether GPT’s answer was correct or is a commonly observed mistake in humans. These experiments included feeding GPT-3 with hypothetical situations and GPT-3 scored 6 correct answers out of 12. One of these hypothetical situations included the “Linda problem”. The researchers described Linda to GPT-3 as ‘outspoken, bright and politically active’, then GPT-3 was asked if it was more probable that Linda was a bank teller, or both a bank teller and an active feminist. GPT-3 chose the option that most humans chose: the bank teller and an active feminist (Binz & Schulz, 2023). The researchers also came to the conclusion that GPT-3 was not able to make complex inferences (Binz & Schulz, 2023). However, it was found that the results GPT gave were higher in robustness even when the prompts were slightly changed.

Research has suggested that whereas a human is able to actively use their environment and the people around them to learn language, LLMs such as GPT-3 are only able to learn by having data passively inputted into their system allowing them to learn correlations between words (Binz & Schulz, 2023). These results suggest that a reason for GPT-3’s differences in understanding language is because GPT-3 and humans fundamentally learn language differently.

When looking at both humans’ and GPT-3’s abilities in language learning, Moravec’s Paradox, which suggests that there is an ‘inverse relationship between human and AI proficiency in cognition’, can be observed (Elkins & Chun, 2020). This means that the tasks that humans find simpler and more instinctual are often harder for AI to complete and tasks that take more time and effort for humans such as deliberate higher-order tasks (e.g. mathematical reasoning) are easier for AI compared to humans (Elkins & Chun, 2020).

Although GPT is suggested to not have a full comprehension of language, GPT is still on par with similar-sized BERTs — any large pre-trained neural network — on natural language understanding tasks (Ettinger, 2020). More generally within BERTs, research has shown that they struggle with text that requires large amounts of inference which is in line with the problems GPT faces. These errors suggest that GPT has not acquired an ability to use language for itself, but instead, it can only streamline the communication process between humans, demonstrating that it has not acquired understanding of human language to the same level as humans have. However, the point must be made that GPT is still comparable with other BERTs, and therefore, should not be dismissed as a large language model.

Overall, it seems that GPT seems to mainly have issues with its ability to truly understand the text that it outputs — something that humans do not have as much trouble with — instead of its ability to produce grammatical text. This is backed up by other research suggesting that it is becoming increasingly harder to distinguish text generated by GPT-3 and a human. This is due to the nature of how GPT is taught language, by being passively fed information compared to actively approaching the environment which it is in (Dou et al., 2021).

Comparing child language acquisition to GPT-3

Comparing the human neural networks with artificial neural networks

As the name suggests, artificial neural networks are based on the biological neural networks in the human brain. Our brains are made up of cells called neurons, which constantly send electrical signals via action potentials — a rapid sequence of changes in the voltage across a membrane — to other neurons through connections known as synapses. These neurons transmit signals to one another depending on the signals that they themselves have received from stimuli or from another neuron. An artificial neuron simulates roughly how a biological neuron behaves by summing all the values of the inputs it collects. If this is above the threshold set (activation function), it will send its own signal as its output, which is later received by neurons following the same process (Hsu, 2020).

As one may predict from above, artificial neural networks have similar structures to human neural networks. However, current computational tools using artificial networks, like GPT-3, have demonstrated a lack of understanding of the meaning and context of the outputs that they are emitting.

To highlight the contrast between both structures, it is important to look at the statistical differences which show how unique and powerful the human brain actually is compared to AI. Firstly, the number of neurons in the human brain is approximately 86 billion. The number of neurons in an average artificial neural network is less than 1,000. This difference in number of neurons allows the human brain to reduce the time it needs to learn in comparison to AI. Moreover, research also suggests that the power consumption by biological neural networks is around 20W whereas by artificial neural networks it is around 300W, indicating that not only is the human brain faster, but it is more efficient. These key differences are crucial in order to understand a possible way to improve AI. This may suggest that the method in which neural networks work is viable, and the focus for improvement should be on increasing the number and efficiency of neural nets.

However, there are other differences between human and artificial neural networks that are more challenging to overcome. For example, the human brain demonstrates remarkable plasticity and adaptability, which allows it to restructure itself in response to new experiences or even damage. Additionally, compared to neural nets, the human brain excels in generalising and interpreting data, while neural nets struggle to understand meaning and context outside the given input data. This is a research area that requires further attention in order to improve artificial intelligence.

Behaviorism applied to GPT-3

Please note that there are many other parameters that makes GPT so complex but for the sake of this paper’s argument it will be focusing on how reinforcement learning was used.

While the notion of imbuing ChatGPT with an innate grasp of language might seem daunting, the goal of integrating acquired language skills is easier to attain. By leveraging insights from behaviourism, we can pinpoint specific errors within GPT and undertake necessary adjustments. Consider reinforcement learning, a quintessential behaviourist concept, which was GPT’s defining breakthrough. This model works by giving GPT large sets of data and then fine-tuning which sources it can use to reduce bias. Then, this is where reinforcement learning comes into play. The system is trained with a reward model so that GPT can ‘output a scalar value to depict how good the response is’ (Vikram, 2023). This reinforcement learning through human feedback was what made GPT the famous AI assistant it is today. However, while GPT has proven to be impressive, going as far as to write raps, it still lacks casual reasoning. Even still, using behaviourist concepts we can further GPT-3’s casual language reasoning.

Behaviourism acknowledges the paramount role of context in shaping behaviour, a concept that could profoundly benefit GPT-3’s coding. By infusing contextual cues into the AI’s programming, we can steer its actions toward marked improvement. For instance, in the domain of natural language processing, the AI’s responses could be considerably influenced by the conversational context, leading to more pertinent and coherent interactions. This influence can be achieved by employing a learning approach akin to that of children, where interactions serve as the foundation for growth and development. In this manner, behaviourism presents a compelling avenue to enhance GPT’s linguistic capabilities through intentional interaction and context-driven learning.

Moreover, incorporating an interaction-based learning system could play a vital role in enhancing GPT-3’s causal reasoning and comprehension of human language. By utilising a chat-based setup, GPT can engage with a person knowledgeable about its coding, initiating a process of learning through practical exercises.

The application of behaviourist principles to AI, specifically in the context of GPT’s language skills, holds promise for advancing the capabilities of conversational AI. While endowing AI with an innate understanding of language remains a formidable challenge, the integration of learned language skills is well within reach. Leveraging concepts like reinforcement learning and contextual influence provides a structured framework to refine GPT’s linguistic reasoning abilities. Recognizing the intricate interplay between rewards, penalties, and biases in AI interaction is vital for responsible and effective implementation. Furthermore, the emphasis on context-driven behavioural modification aligns seamlessly with the goals of enhancing AI’s natural language processing. By fostering intentional interaction and incorporating contextual cues, we can guide GPT toward more relevant and coherent responses. In this manner, behaviourism not only offers insights into understanding human behaviour but also lays a foundation for elevating AI to a new height of linguistic proficiency.

Cognitivism applied to GPT-3

When we look at how GPT processes language compared to the cognitivist theory of language acquisition, we see how it uses language in a manner that is completely different from humans. This results in considerable weaknesses on what it can do — weaknesses that cannot be solved. GPT requires a significant amount of linguistic input; however, children need much less and yet are still able to form comprehensible sentences by a certain stage of maturation, avoiding significant grammatical errors. This implies that children have a form of pre-encoded knowledge — referring back to Chomsky’s idea of Universal Grammar. Humans can perform on relatively small amounts of information because the human brain seeks to create explanations between points based on prior information about linguistic rules. GPT is unable to do so and operates on large amounts of data and looks for correlation instead.

GPT works based on description and prediction only, while human minds can go much further than that, making counterfactual speculations — that is, they can make statements based on things that have not happened, or are not the case. For example, GPT can make a statement like ‘The apple will fall if I open my hand.’ This is a prediction. However, a human can say that the apple will fall — or any object for that matter — ‘because of the force of gravity.’ This is an explanation (Chomsky, Roberts & Watumull, 2023). This means that GPT’s predictions are always surface-level — no matter how much data it receives, it will not be able to mimic humans cognitive features.

Improvements to GPT-3

Since GPT-3 gains their linguistic abilities through correlation rather than interaction with its environment, it is simply acting as a reflection of human speech rather than producing new ideas through language. The use of language is one that is able to generate unlimited ideas that have not been explored before, and therefore, perhaps GPT-3’s abilities in language are not as developed as a human’s. However, this would require further research into GPT-3’s ability to use correlation to produce new ideas.

As a large language model, GPT is fed many different types of text in large amounts so that it can predict the next word of a sequence. GPT-3 in particular is trained with text from the Internet. This can lead to GPT-3 being biased. Perhaps one could regulate the data that goes into GPT; however, this would require criteria that are not easily set. Rather than focusing on the specific texts that one should feed to GPT-3 for its language acquisition, perhaps GPT-3 being biased and being a reflection of what our society believes is not always a negative thing. Although such biases may cause problems in factual uses of GPT-3, this mirroring of society can allow our society to become more aware of the biases themselves or realise what prejudices are more prevalent in society.

Conclusion

As we have seen, the way children acquire language and the way GPT-3 acquires language are fundamentally different. Whereas children are able to actively interact with the environment, GPT-3 is only able to passively use text that is fed to it to form correlations between words. Since AI as a whole is structured completely differently from the human mind, it will be near impossible for ChatGPT to ever reach the level of accuracy that humans have in terms of reasoning through language, because they simply do not have the capabilities for it. This is due to the fact that ChatGPT only mirrors the text that they have seen before instead of actually understanding the meaning behind it.

Therefore, there may not be a solution to improving ChatGPT directly since we are unable to change the fact that it cannot understand and utilise the existence of the world apart from the text that it is given for training.

Perhaps the goal of ChatGPT is not to be an exact model of how humans learn language since, as discussed, it is unable to use the external environment to learn and does not have an innate sense of language, unlike humans. Instead of this, the aim of ChatGPT may be to utilise language in a way to make the process of communication between humans more streamlined, and therefore, it only needs to imitate the structure of human language.

There is also the question of whether GPT is able to achieve the same linguistic abilities as humans through a different learning method. We do not think this can be accomplished. Humans have a subjective view of the world and this is what makes their language processing different from GPT. The latter simply makes correlations between data and text. For it to be able to reach human-level language capabilities, it would need to be able to understand exactly what it is saying, as well as having its own consciousness — which, in our view, is near to impossible.

However, this does not mean that it is of no use to us. While we argue that GPT should not be used in complicated, ethical contexts, it would work just as well as humans in more surface-level dilemmas where the human experience and human’s innate sense of language is not necessary. GPT is significantly better in other domains that include computer programming and it is able to store massive amounts of data for us to access easily. Moving forward, it should be a matter of identifying and utilising the strengths of both humans and GPT together so that we are able to cover the limitations of both and use them to their full potential.

References

Binz, M., & Schulz, E. (2023). Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences of the United States of America, 120(6). https://doi.org/10.1073/pnas.2218523120

Biswas, S. (2023). Role of Chat GPT in Education. SSRN. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4369981

Buckner, C., & Garson, J. Connectionism. In E.N. Zalta (ed.), The Stanford

Encyclopedia of Philosophy (Fall 2019 Edition). https://plato.stanford.edu/archives/fall2019/entries/connectionism/

Chomsky, N., Roberts, I., & Watumull, J. (2023, March 8). Noam Chomsky: The False Promise of ChatGPT. The New York Times. https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html

Cook, V. Chomsky’s Universal Grammar and Second Language Learning. Applied Linguistics (1985). http://www.viviancook.uk/Writings/Papers/AL85.htm

‌Dayan, P., & Niv, Y. (2008). Reinforcement learning: The Good, The Bad and The Ugly. Current Opinion in Neurobiology, 18, 185-196. https://www.sciencedirect.com/science/article/abs/pii/S0959438808000767

Dou, Y., Forbes, M., Koncel-Kedziorski, R., Smith, N.A., & Choi, Y. (2021). Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text. arXiv. https://arxiv.org/abs/2107.01294

Elkins, K., & Chun, J. (2020). Can GPT-3 pass a writer’s Turing test? Journal of Cultural Analytics, 5(2). https://doi.org/10.22148/001c.17212

Ettinger, A. (2020). What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models. Transactions of the Association for Computational Linguistics, 8, 34–48. https://doi.org/10.1162/tacl_a_00298

Graham, G. Behaviorism. (2023). In E.N. Zalta & U. Nodelman (ed.), The Stanford Encyclopedia of Philosophy (Spring 2023 Edition). https://plato.stanford.edu/entries/behaviorism/

Hardesty, L. (2017, April 14). Explained: Neural networks. MIT News.

https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414

Hsu, H. (2020, August 5). How Do Neural Network Systems Work? Computer History Museum. https://computerhistory.org/blog/how-do-neural-network-systems-work/

Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). View of reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285. https://www.jair.org/index.php/jair/article/view/10166/24110

Kuhl, P.K. (2000). A new view of language acquisition. Proceedings of the National Academy of Sciences of the United States of America, 97(22), 11850–11857. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC34178/

Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., & Chen, W. (2021). What makes Good In-Context Examples for GPT-3? arXiv. https://arxiv.org/abs/2101.06804

Liu, X., Zheng, Y., Du, Z., Ding, M., Qian, Y., Yang, Z., & Tang, J. (2021). GPT Understands, Too. arXiv. https://arxiv.org/abs/2103.10385

Markou, A., Salamone, J.D., Bussey, T.J., Mar, A.C., Brunner, D., Gilmour, G., & Balsam, P. (2013). Measuring reinforcement learning and motivation constructs in experimental animals: Relevance to the negative symptoms of schizophrenia. Neuroscience and Biobehavioral Reviews, 37, 2149-2165. http://dx.doi.org/10.1016/j.neubiorev.2013.08.007

Samet, J. & Zaitchik, D. (2017). Innateness and Contemporary Theories of Cognition. In E. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Fall 2017 Edition). https://plato.stanford.edu/entries/innateness-cognition/

Srinivasan, M., Al-Mughairy, S., Foushee, R., & Barner, D. (2017). Learning language from within: Children use semantic generalizations to infer word meanings. Cognition, 159, 11-24. https://doi.org/10.1016/j.cognition.2016.10.019

Vikram, M. (2023, June 28). How Does Chat GPT Work: From Pretraining to RLHF. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2023/05/how-does-chatgpt-work-from-pretraining-to-rlhf/

Zhang, M., & Li, J. (2021). A commentary of GPT-3 in MIT Technology Review 2021. Fundamental Research, 1(6), 831–833. https://doi.org/10.1016/j.fmre.2021.11.011