我是 CS 的一年级学生,我在我父亲的小企业做兼职.我在实际应用程序开发方面没有任何经验.我用 Python 写过脚本,用 C 写过一些课程,但没有像这样的.
我父亲有一家小型培训公司,目前所有课程都通过外部网络应用程序安排、记录和跟进.有一个导出/报告"功能,但它非常通用,我们需要特定的报告.我们无权访问实际数据库来运行查询.我被要求建立一个自定义报告系统.
我的想法是每晚创建通用的 CSV 导出并将它们导入(可能使用 Python)到办公室托管的 MySQL 数据库中,从那里我可以运行所需的特定查询.我没有数据库方面的经验,但了解非常基础的知识.我已经阅读了一些关于数据库创建和范式的内容.
我们可能很快就会开始拥有国际客户,所以我希望数据库在发生这种情况时不会爆炸.我们目前还有几家大公司作为客户,拥有不同的部门(例如 ACME 母公司、ACME 医疗保健部门、ACME 身体护理部门)
我提出的架构如下:
我在一张纸上设计"(更像是潦草地写的)模式,试图将其规范化为第 3 种形式.然后我将它插入 MySQL Workbench,它让我觉得这一切都很漂亮:
(
(来源:maian.org)
感谢您的时间
您的问题的更多答案:
1) 对于第一次处理此类问题的人来说,您几乎是目标.我认为到目前为止其他人在这个问题上的指示几乎涵盖了它.干得好!
2 &3) 您将受到的性能影响在很大程度上取决于为您的特定查询/过程拥有和优化正确的索引,更重要的是记录量.除非你在你的主表中谈论超过一百万条记录,否则你似乎有一个足够主流的设计,在合理的硬件上性能不会成为问题.
也就是说,这与您的问题 3 相关,从一开始,您可能真的不应该过度担心性能或对规范化正统观念的过度敏感.这是您正在构建的报告服务器,而不是基于事务的应用程序后端,后者在性能或规范化的重要性方面会有很大不同.支持实时注册和调度应用程序的数据库必须注意需要几秒钟才能返回数据的查询.不仅报表服务器功能对复杂和冗长的查询有更大的容忍度,而且提高性能的策略也大不相同.
例如,在基于事务的应用程序环境中,您的性能改进选项可能包括将存储过程和表结构重构到第 n 级,或者为少量常用数据开发缓存策略.在报告环境中,您当然可以这样做,但通过引入快照机制,您可以对性能产生更大的影响,在该机制中计划进程运行并存储预配置的报告,您的用户可以访问快照数据,而不会对您的数据库层造成压力以每个请求为基础.
所有这些都是一个冗长的咆哮,以说明您所采用的设计原则和技巧可能会因您创建的数据库的角色而有所不同.我希望这会有所帮助.
I'm a first year CS student and I work part time for my dad's small business. I don't have any experience in real world application development. I have written scripts in Python, some coursework in C, but nothing like this.
My dad has a small training business and currently all classes are scheduled, recorded and followed up via an external web application. There is an export/"reports" feature but it is very generic and we need specific reports. We don't have access to the actual database to run the queries. I've been asked to set up a custom reporting system.
My idea is to create the generic CSV exports and import (probably with Python) them into a MySQL database hosted in the office every night, from where I can run the specific queries that are needed. I don't have experience in databases but understand the very basics. I've read a little about database creation and normal forms.
We may start having international clients soon, so I want the database to not explode if/when that happens. We also currently have a couple big corporations as clients, with different divisions (e.g. ACME parent company, ACME healthcare division, ACME bodycare division)
The schema I have come up with is the following:
I "designed" (more like scribbled) the schema on a piece of paper, trying to keep it normalised to the 3rd form. I then plugged it into MySQL Workbench and it made it all pretty for me:
(Click here for full-sized graphic)
(source: maian.org)
Thanks for your time
Some more answers to your questions:
1) You're pretty much on target for someone who is approaching a problem like this for the first time. I think the pointers from others on this question thus far pretty much cover it. Good job!
2 & 3) The performance hit you will take will largely be dependent on having and optimizing the right indexes for your particular queries / procedures and more importantly the volume of records. Unless you are talking about well over a million records in your main tables you seem to be on track to having a sufficiently mainstream design that performance will not be an issue on reasonable hardware.
That said, and this relates to your question 3, with the start you have you probably shouldn't really be overly worried about performance or hyper-sensitivity to normalization orthodoxy here. This is a reporting server you are building, not a transaction based application backend, which would have a much different profile with respect to the importance of performance or normalization. A database backing a live signup and scheduling application has to be mindful of queries that take seconds to return data. Not only does a report server function have more tolerance for complex and lengthy queries, but the strategies to improve performance are much different.
For example, in a transaction based application environment your performance improvement options might include refactoring your stored procedures and table structures to the nth degree, or developing a caching strategy for small amounts of commonly requested data. In a reporting environment you can certainly do this but you can have an even greater impact on performance by introducing a snapshot mechanism where a scheduled process runs and stores pre-configured reports and your users access the snapshot data with no stress on your db tier on a per request basis.
All of this is a long-winded rant to illustrate that what design principles and tricks you employ may differ given the role of the db you're creating. I hope that's helpful.
这篇关于首次数据库设计:我是否过度设计?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!