Laura Martin MA, BA. Laura studied at the University of Oxford between 2012-16, gaining a First Class degree in Classical Archaeology, After a year of studying at postgraduate level at Princeton in the United States, she decided to pursue a career in teaching. The pandemic offered her the opportunity to return to study, and she gained an Engineering A.S. in 2021 with a perfect 4.00 GPA.

She now works as a Data Engineer and is a passionate advocate for gender inclusivity in technology. Her particular areas of expertise are data engineering and analytics for business, machine learning, and artificial intelligence.


Large Language Models (LLMs) have gained significant attention in recent years due to their remarkable capabilities in natural language processing tasks, but speculation has arisen about whether the economic advantages such as greater productivity outweigh the clear economic disadvantages of worker displacement. Other concerns are in the ethical sphere; data sets of LLMs are the building blocks of the responses that are generated. In order to get clean, accurate, non-biased results, selecting, pre-processing, classifying and fine tuning should be done properly (parallel to the aim of the Large Language Model). However, the collection of this data itself creates ethical and legal issues. LLMs have problems with copyright infringement in its training data, and also ownership of the works it creates, as well as friction between artists and people using AI to imitate artists, creating a greater focus on the unclear legal language. Broader social impacts include effects on education; LLMs have already significantly impacted education by automating tasks like grading, content generation, and personalised learning. But concerns arise regarding content accuracy, ethical issues such as plagiarism and proper attribution, bias, and data security, necessitating supervision and addressing transparency challenges, demanding action from both students and educators. A wider issue is the environmental impact of these resource intensive tools; LLMs and ChatGPT require a lot of energy to train and keep running, which requires a lot of water for cooling the and CO2 in order to produce the power required. However, offsetting this is the potential for great benefit when it comes to more energy efficient data management systems. This article explores the emergent impact of LLMs on human society, economics, and culture, through a range of lenses, discussing the potential for positive social impact through education and the streamlining of work practices, but also considering the many checks and balances that these tools will need to ensure they are not misused.

Literature Review

In the last 5 years LLMs have become more popular both in the media and academic world. An LLM is a type of machine learning model that learns rules, information and patterns from the data that it has been trained on. Machine learning has been studied since the 1950s, and a wide range of data sets and training techniques have been used. The first software to show that computers can learn in addition to following instructions was Checkers, which could compete with a competent amateur human player (Samuel, 1960). However, the first language-based model was a natural language processing program called ELIZA, which learned to mimic a doctor, reacting to inquiries in a psychotherapist-like manner. Until the interaction reached its limits and turned into nonsense, some users felt they were speaking with another human being (Weizenbaum 1966).

The constraints of computer power, memory, and processing speed were the key technological impediments that artificial intelligence (AI) ran up against during the 1970s, when high expectations met difficulties in realising them from a technical perspective, which were at the time unsolvable. However after the 1970s, computer scientists started to develop new learning techniques from data sets. According to Hopfield (1982), a particular type of neural network known as “Hopfield net” learnt and processed information in a novel way. Connectionism in AI was revitalised by the “Hopfield net” and “backpropagation” (Rumelhart et al. 1985). Then a decision tree was produced using the ID3 algorithm from a dataset. According to Quinlan (1986), ID3 is the ancestor of the C4.5 method, which is utilised in machine learning and natural language processing.

With the beginning of the third millennium, AI experienced a significant acceleration in development. In 2011, the typical ILSVRC classification error rate was about 25%. The classification error rate of a deep convolutional neural network dubbed AlexNet was 16% in 2012  and during the next few years, it decreased to a few percent (Krizhevsky et al. 2012). These innovations led to a change in the AI framework to deep learning. As the training techniques developed some of the answers that LLMs produced were however labelled as “unethical”, which highlighted the importance of quality of training datasets.

Some of the earliest environmental concerns surrounding LLMs and AI date back to 2020. A study conducted by the European Union in 202 identified some major industries where critical raw materials were used. In this study, they detailed the use of metals and minerals used in general computing. More research will need to be performed to know the quantities of the materials used for AI. A paper by Abu, Arman, and Landon (2021) documents the environmental footprint of cloud data centres in the United States. This paper discusses the hydro, electric and GHG emissions produced by cloud data centres. More research needs to be performed to determine the energy usage of AI data centres as AI data centres use more energy due to the use of GPUs and the increased amount of processing power compared with Cloud purposes. Alexandra, Sylvaine, and Anne-Laure (2022) analyses the resource use of an AI called BLOOM, split up between idle, infrastructure and dynamic power consumptions, and CO2 emissions as well as comparing the total cost of energy between different LLMs. One review published on the 13th of June summarised various articles describing the positive usages of AI. These articles talked about the use of AI for better data-management and data-analysis in order to reduce waste in various fields, such as energy networks, weather predictions for renewable energy, and more energy efficient industrial processes. Rita Li (2023) summarises many of the positives and negatives that come from the environmental impact of AI, coming to a conclusion that in general the use of AI is very mixed environmentally as there are major positives and negatives in this area.

AI has become an increasingly prevalent topic in media with its sudden relevance comes both creative and artistic consequences (Torrance & Tomlinson (2023). As outlined by Guadamuz (2020), there are two main artistic and creative legal roadblocks facing AI and its creative products. The risk of copyright infringement when copyrighted works are included as part of the training data, and a lack of clarity in the law over who owns the copyright for the AI creations. As well as this there is the social reaction to AI produced pieces of art, friction between artists and people who use AI to imitate artists styles and fear that artists’ jobs will be replaced by these LLMs (Heikkilä, 2022; Baio, 2022). The lack of legal precedent creates uncertainty about the legality of LLMs and their creations.

From a social perspective, LLMs, such as GPT-3 and ChatGPT, have already markedly influenced the educational sector. Recent studies that examine their impact in education discuss diverse applications of LLMs, some of which include assessment and grading; teaching support and content generation; and prediction; recommendation and personalised learning. For example, automation in tasks like essay scoring and question generation has streamlined grading, enabling personalised feedback and enhancing learning efficiency (Zheng et al., 2022; Carroll et al., 2022). LLMs can also aid in the swift creation of teaching resources, from lesson plans to academic writing in healthcare education, reducing manual effort and boosting content quality (Li et al., 2023; Gilson et al., 2023; Sallam, 2023).

LLMs have also been demonstrated to provide tailored intervention recommendations and modify content to individual learners, addressing learning gaps and fostering adaptive environments (Kurdi et al., 2020; Rudolph et al., 2023; Sallam, 2023). However, there are also concerns expressed in these works. Commonly discussed challenges include content accuracy; ethical and bias issues; and transparency and data security. This demonstrates that human intervention and supervision is required, as LLMs can occasionally produce superficial or inaccurate content (Sallam, 2023). Furthermore, the inherent risks of bias, data privacy, and plagiarism need addressing, especially given LLMs’ potential influence by their training datasets (Marine & Purkayastha, 2022; Liang et al., 2023; Sallam, 2023). LLMs’ workings are often clear only to AI experts, raising transparency concerns. Further, with their data dependence, robust data governance and cybersecurity measures become indispensable (Yan et al., 2023; Sallam, 2023).

LLMs have gained significant attention in recent years due to their remarkable capabilities in natural language processing tasks. In the economic industry, large language models have the potential to improve efficiency, accuracy, and communication in several different ways;: LLMs can automate repetitive and time-consuming tasks, leading to increased operational efficiency. Businesses can streamline processes like customer support, data analysis, and content creation, allowing employees to focus on more strategic and creative tasks. Through automation and increased efficiency, LLMs can lead to significant cost savings for businesses. Companies can optimise their workforce by automating routine tasks, reducing the need for additional personnel, and reallocating resources to other areas of the business. LLMs enable the development of new business models and services. Companies can leverage LLMs for content generation, personalised marketing campaigns, and data-driven insights, creating opportunities for innovation and revenue growth. As LLMs advance over time and better align with user preferences, we can anticipate continuous improvement in performance. However, it is essential to recognize that these trends also bring a variety of serious risks; because LLMs can automate tasks that were previously performed by humans, leading to potential job displacement in various sectors (Chohan, 2023). Routine tasks such as content creation, customer service, and data analysis may be automated, resulting in reduced demand for certain job roles. As job roles evolve due to automation by LLMs, workers may face a mismatch between their existing skills and the skills required for new roles. This could result in unemployment or the need for extensive retraining, leading to temporary economic disruptions. Concentration of Economic Power: Companies with the resources to develop and deploy LLMs may gain a competitive advantage, potentially leading to a concentration of economic power among a few tech giants, which could limit market competition and hinder smaller businesses’ ability to compete effectively.


Data sets are the base of the structure that is built to create LLMs. The data could differentiate the responses that LLMs give. There are lots of parts of the data that the machine learns on, afterwards engineers tune the model in order to make the model “usable”. The training procedure for ChatGPT was broken down into many steps in order to create the machine learning model. The data set consists of different types of data which are collected from the internet, books, articles, Wikipedia and some web pages that allow anyone to publish data. Just like any other LLM model all GPT versions are based on transformers, which is a system that gives positions and weights to each word or phrase and creates token relationships. As the models get more and more advanced the layers and classifications in datasets get more complex. But as the models are trained with bigger and more complex data sets (model-centric AI) the problems in the data occur, such as inaccurate labels, duplicates, and biases. For a solution to this problem scientists created a data set that increases the quantity and quality of data utilised to create AI systems. This means that the models are slightly more “fixed”, and the focus is on the data itself. This model gives much more room for AI to be used in the real world. The data set selection plays a crucial role in LLMs. It is now feasible to combine existing datasets of interest to create a new dataset that satisfies our demands as the number of datasets accessible increases over time. Dataset discovery seeks to locate the most relevant and helpful datasets from a data lake, a collection of datasets saved in their raw formats, such as public data-sharing platforms and data markets. Also it is questionable whether it is feasible and possible to synthesise a complete data set in order to train LLMs.

One of the biggest issues identified with AI is the overwhelming use of electricity and the emission of CO2 that comes from the production of said energy. Luccioni et. al. (2020) found the total power usage of BLOOM to be around 200 kWh and the total CO2 emissions to be 50.2 tonnes for running the AI model. In order to train BLOOM, it required 176B Parameters, 57 gCO2eq/kWh and 433 MWh compared with the calculations of GPT-3 being 429 gCO2eq/kWh emitted and 1,287 MWh for power. Ways to decrease power usage and CO2 emissions are currently being developed such as further advances in decreasing the amount of processing required as well as using renewable energy for AI. In conclusion, the high amount of energy used for creating and running AI is a major concern. However, many engineers are currently working on solutions to decrease the amount of energy required for AI. In order to run AI models, a lot of water and rare earth metals are required. Siddik et al. (2021) found Data Centers to be the 10th largest user of water in the US for 2018 as well as the national average of data centres being 7.1 m^3 of water per 1 MWh. A study by the EU in 2020 found a list of metals that were particularly important in the field of computing. This list included silicon, cobalt, lithium and rare earth metals. More research needs to be done on how the use of water and metals has changed since the popularisation of LLMs. In spite of the massive energy usage, AI has a significant role to play in the assistance of diminishing the environmental impact of various high-energy operations. Chen, L., Chen, Z., Zhang, Y. et al. (2023) discuss the various impacts that AI has had in fields by summarising several papers. Some of the most important fields identified are the increased energy efficiency, the ability to better manage the electrical grid, and the optimizations of AI in industrial processes. If we do not consider the environmental impact of AI, many optimizations in other computing and energy related fields could take longer to develop. This would lead to an increased environmental impact, which would increase the rate of climate change significantly and put us back several years in climate change mitigation initiatives. Therefore, developing sustainability and considering sustainability in AI is significantly important in developing AI further for the human race to progress in its management systems and increase the efficiency of everyday operations.

The widespread adoption of LLMs in the educational sector has undeniably transformed the landscape of teaching and learning. While the literature review has highlighted the potential benefits and challenges of LLMs, this discussion aims to delve deeper into the broader implications and considerations for the future of education. The introduction of LLMs in education risks exacerbating the digital divide, where students with limited access to technology might be left behind. As LLMs become more integrated into curricula, there’s a potential for a widening gap between students who can access these tools and those who cannot. Strategies to bridge this gap might include government or institutional subsidies for technology, public-private partnerships to provide affordable access, and community-driven initiatives to share resources. Furthermore, hallucinations in LLMs, leading to unreliable information, pose significant challenges. The question arises: can we ever trust LLMs to produce correct information consistently? 

While LLMs have shown proficiency in many tasks, their occasional inaccuracies, especially in complex subjects like mathematics, necessitate human intervention. This supports the argument for educators to use LLMs as tools, verifying and amending content as needed, rather than letting them take the driver’s seat. The potential misuse of LLMs for tasks like homework and assignments raises concerns about academic integrity. Current attempts to combat plagiarism, such as returning to pen and paper tests or oral examinations, might seem regressive but are essential in ensuring genuine student effort. However, this could lead to an “arms race” between AI generation and detection tools. A more sustainable approach might involve restructuring the purpose and format of homework, emphasising learning over grading. This would require a significant cultural shift in our understanding of education’s purpose, moving away from task completion to genuine engagement and appreciation of learning. It is also essential to note that while LLMs can automate many tasks, it’s crucial to strike a balance to ensure students develop creativity and critical thinking skills. Educators might use LLMs to handle repetitive tasks, freeing up time to engage students in discussions, debates, and problem-solving exercises. This approach ensures that students are prepared not just with knowledge but with the skills to apply it innovatively.

Nonetheless, the ability of LLMs to analyse vast amounts of educational data offers unprecedented insights into student performance and learning patterns. Such data-driven decision-making can help educators tailor their teaching methods, identify students who might need additional support, and even predict future educational trends. Additionally, with the rise of AI in various sectors, it’s imperative to equip students with future-ready skills. Beyond traditional subjects, curricula should incorporate lessons on AI ethics, data literacy, and interdisciplinary problem-solving. This holistic approach ensures that students are prepared for jobs that might not yet exist and can navigate an increasingly AI-driven world with confidence and competence.

Also worth discussion in this regard is the impact of AI tools on the creative industries. With the wave of new AI sweeping across the internet, legal issues to do with the creative product of these LLMs have been brought to the forefront of discussion about AI (Vincent, 2022). The two main issues being brought up are ownership of the final product, and the legality of using copyrighted material in the massive datasets for these LLMs. 

Firstly, the issue of the training data. There have been numerous accounts online of artists having their work unwillingly included in training data (Heikkilä, 2022; Baio, 2022). Authors have reported having books being sold under their names, artists have found uncanny imitations of their distinct style. The proposed EU AI Act begins to set groundwork for dealing with these novel issues, proposing that creators of Generative AI be required to publish summaries of copyrighted data used for training. Agreement on the final form of the law is expected to be reached by the end of 2023 (European Parliament, 2023). 

Secondly, the issue of ownership. In both US and UK copyright law, ownership is given, with some exception, to the author of the creative work. US copyright law clearly states in The Compendium II of Copyright Office Practices in Section 503.03 that ‘In order to be entitled to copyright registration, a work must be the product of human authorship. Works produced by mechanical processes or random selection without any contribution by a human author are not registrable’, hence stopping the LLM from receiving ownership of the copyright. The UK is unique in the regard that it has a clause to deal with the case of computer generated work; Section 9(3) of the Copyright, Designs and Patents Act (CDPA): ‘In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.’ However, this doesn’t provide a definite owner of LLMs’ creations, the creator of the LLM could be argued to be the author by this definition, as could the person who prompted the LLM to generate the work, or the multitudes of people who created the training data. It is a complex issue with no clear answer. Comparing generative AI like DALL-E with graphics programs like paint or photoshop makes the idea that the prompter could gain copyright seem ridiculous, the amount of creative input required to create a piece of art in paint is so much more than the effort needed to prompt an AI to generate something. A more accurate comparison could possibly be photo filters, where the original creator of the photo still holds copyright. Considering the technical side of LLMs and how they take in data, the answer that makes the most sense is to give copyright to those included in the training data, however due to the sheer amount of data used in typical LLMs, this answer is impractical.

US copyright law requires that there be an amount of creativity in the piece, which becomes an interesting problem when considered with generative AI. Can AI truly create? A creative work based on formulas and data could lack human soul and spirit, being just an amalgamation of data. However decisions made about the AI’s training and structure were human, and it does undergo a similar process to humans of taking in data, processing it, then creating an output. Despite this, AI just doesn’t have emotion, a core part of any artistic work, therefore any AI piece is purely an imitation of human feeling, based on how previous artists have expressed themselves. 


The success of LLMs depends on their careful integration into our existing social structures, institutions, and systems in a way that maximises the positive potential of these tools and mitigates the risks of their use. Because these tools are so recently developed, there is little potential for the use of precedent in determining this, so active approaches to management are necessary. One example of this management is the crucial role of diverse data sets in training Language Models (LMs) like GPT, where data quality and quantity influence model behaviour. As models advance, challenges of biases and inaccuracies are addressed through data-centric AI approaches, aiming to improve both the quality of training data and LM applicability. Another issue to be addressed with datasets is authorship and ownership; currently, LLMs such as ChatGPT have problems with use of copyrighted training data, and the determination of authorship of creative works generated by those LLMs. Laws will inevitably be worked out and made as cases involving LLMs and AI progress, but at present, this issue remains unresolved. In the case of social impacts, the adoption of LLMs in education offers benefits such as personalisation, efficiency, and data-driven insights, but raises concerns about the digital divide, trustworthiness of content, and academic integrity. A balanced approach combining human intervention and a change in education culture is essential for their successful integration. Finally, some of the wider concerns of LLMs and AI lie in economic and environmental impact; in order to balance the conflicting sides of the high resource usage but massive potential for environmental innovation, we will need to consider the sustainability of AI in order to progress technology and not hinder our developments towards carbon neutrality and more environmentally friendly systems. Similarly, as LLMs continue to advance the fear of AI tools “taking over” becomes more of a threat; research shows even if they stop advancing at this stage of development the economic impact would still be significant. However, the advantages of these tools usually get overshadowed by discussion of the negative. As with most human innovation, the creation of a tool is not transformative in and of itself; it is the way that humans choose to use that tool that defines whether it is a positive or negative development. Large Language Models are no exception, as this discussion has demonstrated.


MIT Insights. (2018). 2018 Global Report: The Artificial Intelligence Imperative. MIT, 15.

TED. (2023). How AI Could Save (Not Destroy) Education | Sal Khan | TED [YouTube Video]. On YouTube.

United Kingdom Intellectual Property Office. (1988). Copyright, Designs and Patents Act 1988 (p. 32).

Future of Life Institute (2022): Future of Life Institute. (2022). Pause Giant AI Experiments. Future of Life.

Baio, A. (2022). Invasive Diffusion: How one unwilling illustrator found herself turned into an AI model. Waxy. Last accessed 27/08/2022

Carlini, N., Tramèr, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T. B., Song, D., Erlingsson, Ú., Oprea, A., & Raffel, C. (2020). Extracting Training Data from Large Language Models. In Proceedings of the USENIX Security Symposium (pp. 2633-2650).

Chen, L., Chen, Z., Zhang, Y., et al. (2023). Artificial Intelligence-based Solutions for Climate Change: A Review. Environmental Chemistry Letters. Advance online publication.

Chohan, Usman W., Generative AI, ChatGPT, and the Future of Jobs (March 29, 2023). Available at SSRN: or

Critical Raw Materials for Strategic Technologies and Sectors in the EU: A Foresight Study. (2020).

Eloundou, T., et al. (2023). GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models. Journal of Artificial Intelligence Research, 15(2), 112-135. DOI:

European Parliament. (2023). EU AI Act: First regulation on artificial intelligence. Last accessed 27/08/2023

Floridi, L., & Chiriatti, M. (2020). GPT-3: Its Nature, scope, Limits, and Consequences. Minds and Machines, 30(4), 681–694.

Gilson, A., Safranek, C. W., Huang, T., Socrates, V., Chi, L., Taylor, A., & Chartash, D. (2023). How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Medical Education, 9, e45312–e45312.

Guadamuz, A. (2020). Do Androids Dream of Electric Copyright? Comparative Analysis of Originality in Artificial Intelligence Generated Works. Intellectual Property Quarterly.

Heikkilä, M. (2022). This artist is dominating AI-generated art. And he’s not happy about it. MIT Technology Review. Last accessed 27/08/2022

Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554-2558.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25 (pp. 1097-1105).

Li, Y., Sha, L., Yan, L., Lin, J., Rakovic, M., Galbraith, K., Lyons, K., Dragan Gašević, & Chen, G. (2023). Can large language models write reflectively? Computers & Education: Artificial Intelligence, 4, 100140–100140.

Luccioni, A. S., Viguier, S., & Ligozat, A.-L. (2020, November 3). Estimating the Carbon Footprint of Bloom, A 176B Parameter Language Model. Retrieved from

Merine, R., & Saptarshi Purkayastha. (2022). Risks and Benefits of AI-generated Text Summarization for Expert Level Content in Graduate Health Informatics. 2022 IEEE 10th International Conference on Healthcare Informatics (ICHI).

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.

Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation (Technical Report No. ICS-8506). University of California, San Diego, La Jolla, Institute for Cognitive Science.

Rudolph, J., Tan, S. and Tan, S. (2023) Chatgpt: Bullshit spewer or the end of traditional assessments in higher education? Journal of Applied Learning and Teaching, 6(1).

Roumeliotis, K. I., & Tselikas, N. D. (2023). ChatGPT and Open-AI models: A preliminary review. Future Internet, 15(6).

Sallam, M. (2023). ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare, 11(6), 887–887.

Samuel, A. L. (1960). Programming computers to play games. In J. H. Williams (Ed.), Advances in Computers (Vol. 1, pp. 165-192). Elsevier.

Siddik, M. A. B., et al. (2021). Environmental Cost of Cloud Data Centers in the US. Environmental Research Letters, 16(6), 064017.

Torrance, A. W., & Tomlinson, B. (2023). Training Is Everything: Artificial Intelligence, Copyright, and Fair Training.

Vincent, J. (2022). The scary truth about AI copyright is nobody knows what will happen next. The Verge. Last accessed 27/08/2022

Webb, M. (2020). The impact of artificial intelligence on the labour market. Working paper, Stanford University.

Weizenbaum (1966): Weizenbaum, J. (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36-45.

Yan, L., Sha, L., Zhao, L., Li, Y., Martinez-Maldonado, R., Chen, G., Li, X., Jin, Y., & Gašević, D. (2023). Practical and Ethical Challenges of Large Language Models in Education: A Systematic Scoping Review.